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APPENDIX Al 


LECANECALSRERORT ON THE »hESTS OF (ENGLISH #ORUANGLOPRPHONES 


This report is divided into three main sections. The first section 
contains a description of the contents of the English tests for 
Anglophones together with mention of some of the considerations that 
shaped the design of the tests. The second section consists of an 
explanation of the procedure by which the tests were appraised, and a 
presentation and discussion of the results of that appraisal. The 


third section deals with some technical matters related to the tests. 


tie TESPASELECTLON -ANDeTEST (CONTENT 


The Advisory Committee charged with the task of assisting in 
selecting or designing appropriate instruments for evaluating the 
English language competency of Anglophone students took as their 
aim the testing of reading, language, and writing. The test 
package ultimately prepared focussed on those three major 
components of literacy at a level of difficulty appropriate for 
students approaching the Interface. None of the tests addressed 
themselves to specific Grade Twelve or Grade Thirteen course 


objectives. 


dot sdestingsKeading 


Available multiple-choice reading tests were searched for passages 
which addressed reading comprehension in some depth, instead of 


settling for something rather closer to recognition of surface 


detail. Questions which demanded a variety of skills (reading for 
main idea or purpose, and reading to see implications and make 
inferences, in addition to reading for literal meaning) were seen 
as ideal. Other goals of the search were interest and variety in 
passage content, since the effect of those qualities on student 
readers' motivation-- and therefore on test validity--was 


recognized. 


Eventually four passages were drawn from the Service for 
Admission to Colleges and Universities bank of tests designed for 
university entrants. All four appeared in the Canadian Scholastic 
Aptitude Test (1973) as a subtest of scholastic aptitude, not as 
an English achievement test. Since they were designed for 
university-bound students in their last year at secondary school, 
it was anticipated that the passages would be rather too difficult 
for most general level Grade Twelve students, but, for purposes 
of comparing Grade Twelve and Grade Thirteen performances, one 


set of passages had to be selected for both groups of students. 


1.2 Testing Language Achievement 


Two subtests were selected for testing the language component of 
literacy:2 «(i)mM them "constr uctione ishift''s settmemn ployed)! tine ethe 
Canadian Test of English Language Form 760 (1974), and (ii) the 
"sentence correction" set, with minor emendations, employed in 
the Canadian English Language Achievement Test Form 322 (1970), 


using the operational items only. 


In the "construction shift" type of exercise, the student 
is presented with a sentence which is grammatically correct. He 
is directed to make a specific change in the sentence, and then, 
working within the limitations dictated by that specific change, 
he is to rephrase the entire sentence, retaining the original 
meaning as far as possible. His revised sentence must contain one 
of five optional words or phrases. These are labelled (A) to (E), 
and the student marks the one incorporated in his revision of the 


original sentence. 


In the "sentence correction" type of exercise, the student 
is given a sentence in which a section is underlined. Options (A) 
to (E) show five ways of phrasing the underlined section, 
including--as (A)--the original way. The student is to select the 


clearest and most correct of the five phrasings. 


These two types of language exercise were selected by the 
English Advisory Committee for inclusion in the test package 
because of their high correlation with total essay scores, as 
reported by Godshalk, Swineford, and Coffman (1966). At the same 
time, the Committee recognized that performance on any form of 
multiple-choice language test--even a form like the "construction 
shift", demanding sentence manipulation--was a very dubious 
approximation of the student's performance with language. Hence 


the insistence upon a test of student writing. 


ino wuesting Writing 


The Advisory Committee realized that, to ensure reasonable 
reliability and validity, the ideal writing test (Britton, Martin 
and Rosen, 1966) would consist of essays in two modes, with no 
topic choice, to be written on different days. Ihe writing test 
eventually included in the English package “involved some 


compromises with this ideal. 


(a) Test administration constraints allowed only one-half 
day for testing writing. Thus it was possible to ask 


for only sone sessay. 


(b) In the interest of reliability the single essay was 
restricted to a single mode (expository: argument or 
viewpoint)--the mode judged by the Advisory Committee 
as receiving most emphasis in English Composition at 
the senior secondary level, and most important for 
postsecondary success regardless of the student's area 


of specialization. 


(c) Several topics were offered. Though provision of a wide 
choice was recognized by the Advisory Committee as 
inevitably reducing scoring reliability, it was 
intended, on balance, to increase the writing test's 
validity; presumably a student would be more motivated 
to make a serious effort at arguing his viewpoint if 
the subject under discussion permitted him to choose a 


topic of some interest to him. 


1.4 The Test of Reading Comprehension and Language Achievement 


There were two parallel 40-minute forms of this test, 
designated as Form 1 and Form 2. Each form contained two reading 
comprehension passages with five related items per passage. An 


analysis of the items (combining Forms 1 and 2) ran as follows: 


(a) items dealing with literal meaning =) Te} 

(b) items dealing with main idea (implied) - 2; 

(c) items dealing with inference, implication - 7; 

(d) miscellaneous items = es 
Each form also contained eleven "construction shift" items 


followed by fifteen "sentence correction" items. 


Test administration conditions required some Grade Thirteen 
physics students to write the physics test in the second half of 
the” morning testing “period. “Also” in’ that ““time™ "period, 7a 
sub-sample of students wrote the essay. Students writing neither 
the essay "nor the “prysics test. “took “both ‘forms of lhe Test of. 
Reading Comprehension and Language Achievement; those writing 
physics or the essay were randomly assigned to either Form 1 or 


Money, 25 


1.5 The Writing Test 


The sub-sample of Grade Twelve and Thirteen Anglophone 
students writing English tests were given 1-1/4 hours to write a 
250-350 word essay on one of eight topics. They were given the 


following instructions: 


Develop your position with reference to the topic 
ChOSet, ef SUpNOT EIN gGuMvitaawh tne carerulive chasen 
illustrations and carrying it forward to a logical 
conclusion. Feel quite free to agree, disagree, or 
take an intermediate position with respect to _ the 


point of view stated or implied by the topic. 


APPRAISAL Of MPHE SESS 


Both forms of the Test of Reading Comprehension and Language 
Achievement (English) and the Writing Test were sent to teachers 


of Grade Twelve and Grade Thirteen English courses in the 53 
Anglophone schools in the study and to a selected number of the 
instructors of first year English courses in 15 Colleges of 
Applied Arts and Technology and in 11 universities. In addition, 
each of these individuals was sent an appraisal inventory for the 
tests. Secondary school teachers were asked to appraise the tests 
with reference to students taking the following English courses: 
Grade Thirteen, Grade Twelve Advanced, Grade Twelve General and 
Grade Twelve Basic. Postsecondary teachers were asked to respond 
to the inventory with reference to first year regular and/or 


remedial English programs. 


The secondary school edition of the inventory consisted of 
forty questions and the postsecondary edition of thirty-two. All 
thirty-two of the questions on the postsecondary edition appeared 
also in the secondary edition with certain significant differences 
in phrasing. For instance, where secondary instructors were asked 


to consider student performance upon completion of their courses, 


postsecondary instructors were asked to consider the performance 
of students entering their first year courses. For ease of 
comparison, the responses of instructors at both levels will be 


reported here in parallel. 


The additional questions included in the secondary school 
inventory referred in detail to specific passages in the reading 
test. Only overall impressions were solicited from postsecondary 


teachers. 


All questions called for response by number for coding and 
computer print-out; frequent opportunities to supplement coded 
information with written responses were provided. Teachers could 
respond in the code columns for the one, two, three or four 
courses they taught, and then add comments appropriate to one or 
more of the courses. The comments were read and summarized by 


Project II staff following the keypunching of the coded responses. 


2k Nature “omitie English Inventory 


The following limitations and purposes of the appraisal 


inventory should be clarified at the outset: 


(a) Because testing was limited only to reading, language, 
and writing competency (the last restricted further to 
only one mode), the test inventory was similarly 
restricted. Many important areas of program had to be 


left aside: for example, spoken English and literature. 


(b) The question of how best to test competences was a 
major concern of the inventory, every bit as important 
as the question of appropriateness of objectives tested 
OD Onwdirthi ciitty-sbever. “iteiwassano ped abira te ithe 
appraisals might throw some helpful light on issues 
much debated among teachers, such as the utility of 
multiple-choice English tests or reliability in 


evaluating writing samples. 


(c) The English appraisal inventory did not line up 
individual items from the multiple-choice tests against 
individual specific course objeetives, inquiring, "Did 
you teach this?" in the way that mathematics or physics 
appraisal inventories could and did. Most items testing 
language--and reading too, to a considerable 
extent--did not test for a single skill. Instead, the 
sum of items was ja sort of "blitz" of these areas of 


literacy, and the inventory dealt with them as such. 


The following example of a "sentence correction" item 
illustrates the combination of language skills required by such 


items: 
The climax is when the hero stabs the villain. 
(A) is when the hero stabs the villain 
(B) is where the hero stabs the villain 
(C) occurs when the hero stabs the villain 
(D) occurs when the hero stabbed the villain 


(E) is the hero has stabbed the villain 


The selection of the correct answer, "C", depends not only 
on the awareness that "when" and "where" are inappropriate 
conjunctions for beginning the noun clause (eliminating choices A 
and B) but also on an awareness of tense sequence: 
“ecturs...stabs! /trather Withany os" occurs.=/;stabbed'}4 &Whiles, the 
combination of otherwise unrelated grammatical concerns does not 
render the test item inappropriate, it does make link-up between 
test item and a specific language objective very difficult. The 
student who selects (D) or (E£) is partly right in that we may 
assume he has identified and avoided one type of error, but his 


being wholly right depends upon awareness of a second type of 


error. It should be noted as well that the practice item used 
here for illustration is considerably simpler to unravel than many 


of the more complex items in the actual test. 


202 Responses 


Altogether there were 136 respondents for Grade Thirteen, 115 
for Grade Twelve advanced, and 93 for Grade Twelve general. Only 
11 teachers responded for Grade Twelve basic level; therefore, in 
instances where a report of these 11 responses, .given as a 
percentage, would lend the appearance of far more precision and 
consensus than the data warrant, "basic" level responses have 
been deleted. 


Teachers were requested to respond to the inventory for each 
grade and level taught. Consequently many responded for two or 
three levels, and provided a mixture of written responses, some 


generally applicable and some specific to a level. 


A total of 34 CAAT and 25 university teachers responded to 
the appraisal inventory. They were asked to respond for "regular" 
first year or "remedial" or both, depending upon the courses they 
taught. However, as only two university respondents indicated any 
involvement with remedial programs, their responses were combined 


with other university first year responses. 


There appeared to be some confusion on the part of 
postsecondary appraisers concerning response columns "regular" and 
"remedial"; consequently there was some crossing of data which 
was difficult to sort out. Since university fresponses were 
recorded as one total, the confusion is not an issue there, but 
it does remain a problem in interpreting CAAT responses with 
desirable precision. Total responses for CAAT regular English 


programs were 28 and, for remedial, 16. 


2.3 Interpreting Recorded Responses 


In view of the numbers responding for secondary schools and in an 
attempt to facilitate a reasonably straightforward comparison 
between viewpoints of secondary and postsecondary’ teachers, 
responses have usually been recorded as percentages, except where 
means (as in assessment of difficulty by use of a scale) seemed 
more appropriate. Unless otherwise noted, then, answers in the 


following tables are expressed as percentages. 


Because there were, relatively, so few postsecondary 
responses, and because the postsecondary instructors who were 
invited to respond were chosen on arbitrary grounds and as a group 
cannot be said to represent a probability sample of postsecondary 
instructors across the province, one must be careful not to 
interpret percentage scores with a precision and generality they 
do not deserve. It would be inappropriate to consider percentage 
here as any firm consensus of postsecondary teachers of English or 
to judge differences as between 80 per cent and 85 per cent as 


being particularly significant. 


The order of questions as they appeared in the inventory has 
been re-arranged somewhat in this report. Response columns have 


been designated as follows: 


13 Grade 13; 

12A - Grade 12 Advanced; 
12G - Grade 12 General; 
128 
CAAT - First year CAAT ("Regular" and “Remedial" are 


Grade 12 Basic; 


split when appropriate); 


UNIV- First year of university. 


2.4 Reading Comprehension 


It is apparent from the responses to the following eight items 


that both postsecondary and secondary school teachers across the 


grades and levels involved saw the testing of reading 
comprehension as a very important component in the evaluation of 
language achievement. Many saw competence in deriving literal 
meaning, deriving main idea, and deriving inference or 
implication as important on entry to the grade level and as 


important objectives in the program. 


Do you consider 
thata test, of 
reading comprehe- 
hension assesses 
an important com- 
ponent of language 


achievement? ILS! I2A 12G ILZBY (CAVA UNIV 


Yes “Le, 96 Py 22) 94 100 


How many of the 
students entering 
courses at this 
level should 

have the ability 
to read a passage 


[mo ame Wa ip a=) a= 


understanding? 153} 12A 2G 12B CAAT UNIV 
100% 16 61 47 15 Tig 83 
716%-99% 1s, 18 19 38 1? Li, 
51% - 75% 5 isp) 18 2 5) 0 
26%-50% Z 3 uh 8 6 0 
1%-25% 0 3 4 8 0 9) 
None 0 0 0 0 0 0 


Lie 


How many students 
on_entry should 

be able to identi- 
fy the main idea 


Or purpose of 


a passage? 


100% 
16%-99% 
I1%-75% 
26-50% 
1%-25% 


None 


How many students 
on_entry should 
be able to draw 


inferences and 


see_ implications? 


100% 
76%-99% 
D1%-75% 
26-50% 
70-2570 


None 


pial 


12A 


12G 


CAAT 


CAAT 


UNIV 


UNIV 


What emphasis in 
this course is 
given to reading 


for literal 


understanding? 


heavy 


moderately heavy 


light 
remedial only 


none 


What emphasis in 
this course is 
given to reading 
toridentiliy ene 


main idea or 


purpose? 


heavy 


moderately heavy 


light 
remedial only 


none 


13 


13 


eZ 


34 
ZZ 


12A 


36 
10 


12G 


D2 
16 


62 
By 


12B CAAT CAAT UNIV 
REG Rei 


Mae: 44 50 36 
54 dl Zo 20 


23 13 0 24 
0 13 (ae) £2 
0 Oe eee 8 


12B CAAT CAAT UNIV 
REG. REMs 


76 De 67 De 
a 34 yy 40 


) 6 8 0 
0 6 8 4 
0 0 0 4 


What emphasis in 
this course is 
given to drawing 
inferences and 


seeing impli- 


cations? WW, ZA gaze LAs ATE CAAT) UNIV 

REG eye Eis 

heavy 64 41 16 8 Z> 8 64 

moderately heavy oie) 2H 4l 42 44 50 28 
light 5 7 oF) 53 28 33 
remedial only 0 1 8 i] 3 8 
none 1 1) 1 0 0 0 

In response to a further question, “Are there important 


reading skills that were not tested but that should have been?" a 
high proportion responded affirmatively: University - 50 per 
eente aC AAT “= 29 per cent;,-Losand IZA =<40 percents L2G = 36 
permacents The range of suggestions was quite wide. The four 
appearing with greatest frequency were vocabulary; inference or 
implication (perhaps implying that the test items did not go far 
(for structure, use of evidence, logic, bias, etc.); and 
appreciation (presumably of literary qualities). Only about ten of 


those responding suggested testing for rate. 


It is reasonable to conclude that, in the opinion of both 
secondary and postsecondary teachers, the test of reading 
comprehension attacked quite appropriate objectives, though it 
would have been desirable to expand and deepen them somewhat , 
focussing on critical awareness of specific aspects of language 


and style. 


Regarding the suitability of reading passages, secondary 


teachers were asked both general questions and questions about 


13 


each passage; postsecondary teachers were asked the _ general 


questions only. 


From the standpoint both of expectations concerning type of 
material students should be able to read and of appropriateness of 
level of difficulty, there was a high level of agreement between 
Grade Thirteen and university teachers and between CAAT and Grade 
Twelve advanced teachers. Teachers of Grade Twelve’ general 
students considered the passages too difficult and teachers of 
Grade Twelve basic level considered the passages far beyond the 


students' range. 


Are the four pas- 

sages representa- 

tive of the mater- 

ial you would 

expect students 

duetnismLever "ta 

be able to read 

with comprehen- 

sion? Le 12A L2G CAAT UNIV 


Tes hs 20 14 By 80 


14 


What is your over- 
all assessment of 
the difficulty 
fevedoof ‘chef our 
passages given in 


the two forms of 


this test? Le 12A 12G LZ BA eA Ty UNEV 
too easy 1 0 i 0 3 0 
somewhat easy 5) 5) a 0 3 0 
about right 45 a 11 i 26 vi 
somewhat dif ficult 42 40 36 alee 47 29 
tLoosdi ht veal: 8 yA3) ohh 80 Pat 0 
Mean: Sa UL GOK BoM toe eb esse /oe SS GU 35.229 


Only the secondary school teachers were asked to _ respond 
concerning the specific passages. They were asked to make this 
assessment using the range from 1 (too easy) to 5 (too difficult). 


Their responses are reported here as means: 


ee 12A 12G 
Passage on Mass Marketing 3.14 Eee 4.10 
Passage on Viviparity ane 7 ie e's pat oh 
Passage on Object-Perception Pao 4.23 4552 
Passage on Mackenzie King Diet D.6k aay 4 |) 


LS 


Is the passage ap- 
propriate except 


fOr “Olt freulL ty. 13 12A 12G 


(a) Mass Marketing 


Yes 82 81 73 
(b) Viviparity 

Yes 58 56 61 
(c) Object -Perception 

Yes 58 56 54 
(d) Mackenzie King 

Yes 87 80 74 


It is of interest that while most teachers found passages of 
sociological commentary ("Mass Marketing" ) and historical 
commentary ("Mackenzie King") appropriate, those on scientific 
topics were considered inappropriate by a substantial proportion 
of respondents. While a few made criticisms regarding style or 
remoteness from student interests, the strongest criticisms, 
almost exclusively restricted to "Viviparity" and "Object- 
Perception’, were of the technical and abstract vocabulary which 
were considered as seriously disadvantaging the non-scientific 
student. It should be noted, however, that though criticism on 
this ground occurred in at least 30 per cent of the responses, 
score results indicate that students did slightly better on the 
"scientific" passages than on the other pair. Some teachers 
recommended passages from literature or passages of higher 


literary quality as alternatives to the "technical" ones. 


The criticisms raise important questions. On tests of 
Treading at the senior secondary level, is it appropriate or fair 
to include topics which are rather specialized? Should secondary 
education be training students to read materials in a wide range 
of topic areas? If so, should the responsibility fall principally 


on the teacher of English? 
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The postsecondary teachers, though they had not been invited 
to comment on specific passages, made a few general suggestions: 
that the range of passages be broader, and that controversial 
passages (such as "Mackenzie King", presumably) be avoided. A few 


noted that the present passages lacked interest. 


Respondents were asked also to comment on _ the 


multiple-choice format for testing reading comprehension. 


Is the multiple- 
choice format a 
reasonable method 
of assessing at 
least the three 
reading compre- 


hension abilities 


ment ioned? 13 12A L2G°ORCAATS? WNIV 
Yes 40 46 41 62 40 
Yes (qualified) 53 49 50 33 48 
No 7 5 9 4 12 


Over 90 per cent of the respondents reacted favourably to 
the multiple-choice format for reading comprehension though 
approximately 50 per cent had serious qualifications. These 
concerned test conditions such as the time allowance (judged 
inadequate), testing pressures, or guessing as a factor; the 
wording of items; and certain inadequacies of the test. Its 
failure to measure more complex understandings or _ subtle 
distinctions was noted by some, and some expressed the need for a 
response from the student "in his own words". In fact, almost 
all of the criticisms pointed to a need for some written response 


on the part of the student. 


ee 


Thus, one might conclude from the appraisal of the reading 
test that teachers of English at all levels would be reasonably 
Satisfied with a test of reading comprehension combining 
multiple-choice and written responses and restricted to reading 


passages in general interest areas, not necessarily "English". 


Tha Language Achievement 


Respondents at both levels were asked to consider the objectives 
tested by both the "construction shift" and "sentence correction" 


language exercises. 


What is your evaluation 

of the language 

achievement items 

in the tests from 

the standpoint of 

emphasis given to 

usage, style, 

grammar, struc- 

ture, and idiom? 13 12A 12G CAAT UNIV 


They provide a reas- 

onable balance in 

testing important 

areas of language 

achievement . 63 63 70 81 hls 
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Though testing a 
number of import- 
ant areas, there 
is an imbalance 


in emphasis. 17 v2 1D 6 4 


Important areas 

of language 

achievement are 

omitted or 

tested too 

lightly. 19 24 14 CB) 25 


Those responding (2) or (3) referred less to balance and emphasis 
than to areas of language competence they felt should also have 
been tested. These complaints included the neglect of crude as 
opposed to subtle errors (e.g. sentence fragment) and the failure 
to test mechanics such as_ punctuation and spelling. But the 
infrequency of these criticisms leads to the conclusion that, 
overall, there was a strong general consensus that the test 
addressed itself to matters of language for which competence was 
expected for a large proportion of students by fps end of the 
secondary program: 83 per cent responding for Grade Thirteen 
believed these competences should be achieved by 75 per cent or 
more of the students. The figure for Grade Twelve advanced was 
72 per cent of the respondents, and for Grade Twelve general, 45 
per cent. Im comparison, 69 per cent of the CAAT teachers and 
G4" "per™ cent “of the University ‘instructors expected these 
competences from /> per cent or wore’ ot the ‘students on-entry to 
those institutions. (Please note: In all instances "competence" 
means competence in the areas of language tested and _ not 


necessarily near perfect performance on the test items.) 


ile 


How many students 
should have the 
competences assessed 
by the language 
achievement items 

upon entry to your 
course? (Postsecondary: 
"institution" replaces 


"course " 


100% 
7167-99 % 
I1m-75% 
26%-50 % 
7w- 290 


none 


How many of the 
students who 
successfully 

complete English 
courses at this level 
should have the 
competences assessed 
by the language 


achievement items? 


100% 
76%-99% 
D1 %m-75% 
26%-50% 
1%-26% 


none 


Further, as_ the 


focused on aspects of 


Hi) 


34 
24 


13 


Bye) 
1 


following table 


by) 
18 


12G CAAT 
Lo 4] 
15 28 
Jays) 13 
20 ) 
10 » 
D 6 


UNIV 


12G CAAT CAAT UNIV 


RG Samrat 
17 28 38 70 
28 ol emt 13 
Dili 24 6 9 
le 16 eZ 0 
2 0 6 4 
3 ih 6 4 


demonstrates, 


language considered important 
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the test 


across the 


Interface and therefore given considerable emphasis at every 
level. 


How much emphasis 

do you give to de- 

veloping the com- 

petences demanded 

by the language 

achievement items? ED 12A 12G ~ CWAT MCAAT “UNIV 
WE Gee REMs 


hone=—Co mpetences 


are unimportant il Z i 0 0 8) 
heavy 16 12 10 Ha 65 20 
moderately heavy 43 50 45 50 12 NPA 
light 22. iL 15 i i, le? 56 
remedial only Jags 13 il) 13 PZ 26 


none --competences 


are too advanced 2 ¥ 7 0 6 0 


Respondents were asked to assess the two item types used in 
the language achievement test and the difficulty level at which 


they were set. 


What is your eval- 

uation of the "con- 

struction shift" 

items? 1 By 12A 226 CAAT UNIV 


Suitable but too 
easy 8 9 Z 0 0 


Suitable and 


Pighe 1 or 
difficulty eo 49 3D 34 60 


zi 


Suitable but too 
dif ficult 10 16 42 24 8 


Unsuitable for 
reasons other 


than difficulty 27 Za Lo BZ By: 


What is your eval- 
uation of the "sen- 
tence correction" 


items? Ne) 12A 12G CAAT UNIV 


Suitable but too 
easy 1s) 7 Z 0 4 


Suitable and 
right for 
difficulty 67 ae. 47 76 86 


Suitable but too 
difficult 10 1S 38 18 0 


Unsuitable for 
reasons other 


than difficulty 9 4 N62 6 9 
Those responding with any qualification about the item types 
were encouraged to comment further: 
Under "suitable", very few respondents had _ negative 


comments to make concerning the "sentence correction" type. 


Indeed, it received a number of gratuitous favourable comments. 


Veh 


"Construction shift", however, was not so _ favourably 
Teceived. The format and instructions for this item type were 
considered by many from both secondary and postsecondary levels to 
be very confusing for students; a number of secondary school 
teachers claimed that the student could not determine what he was 
asked to do. The relatively high percentage of students failing to 
complete this section of the test adds weight to this criticism: 
the average completion per item for the construction shift by 
Grade Twelve students was 82 per cent, while for the sentence 


correction type it was 97 per cent. 


The situation might have been ameliorated considerably if 
copies of the Student Handbook had been received earlier at the 
schools, and students and teachers had been able to practise and 
discuss the sample items in a relaxed atmosphere. The need for 
prior practice. seems most important if the "construction shift" 
item _is_ to be used in future. Possibly, also, more time should 


be allowed for this type of exercise. 


A number of teachers questioned what the "construction 
shift" was intended to measure, a question not raised for the 
"sentence correction" type of item. Evidently appraisers were 
less impressed than the Advisory Committee had been with the fact 
the in the "construction shift" exercise the student himself must 
manipulate sentence elements, a task not called for in any other 


available multiple-choice item type. 
Respondents were asked for an overall evaluation of the 


difficulty level of the language achievement parts of the test 
(O--too easy; 5--too difficult). 


Zs 


Wee) 12A 12G CAAT UNIV 


too easy 3 i 0 0 9 
somewhat easy a Y/ 0 3 ie, 
about right D2 Da 26 38 61 
somewhat difficult Zo Sil by) DS) 17 
LOO Gif ficutr 2 10 3 6 6 
Mean Bie val STA 4.10 Sap pl Pep ps is) 


It is interesting to observe here that the teachers of 
advanced Grade Twelve generally saw the test as close to 
appropriate in difficulty (despite the general concern, noted 
above, concerning the "construction shift"), whereas general Grade 
Twelve teachers saw the test as too difficult. The means of the 
university and Grade Thirteen responses are approximately 
equidistant from (3) "About right". The responses suggest that 
tests of this order, while appropriate for the university-bound 
students for whom they were originally designed, may be 


inappropriate for the senior student without university ambitions. 


Regarding the suitability of multiple-choice format and 
alternatives, the following questions were asked in the context 
of the combination of the language test with (for a sub-sample of 


students) the essay. 
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Do you think a test 
in multiple-choice 
format is is gen- 
eral a satisfactory 
means of assessing 
students' language 


competence? 


Yes 
Yes (qualified) 
No 


Where multiple- 
choice tests are 
used, should they 
be supplemented by 


other measures? 


Yes 


What importance do 
you place upon a 
sample of the stu- 
dent's writing in 
an evaluation of 


language competence? 


essential 


important but 


not essential 


ho 


28 


44 
28 


(eS) 


87 


1 


a5 


Zi 


90 


89 


iy 


12G 


36 


37 
Zh 


12G 


86 


18 


CAAT 


24 


61 
15 


CAAT 


82 


CAAT 


UNIV 


eZ 


76 
2 


UNIV 


96 


UNIV 


of minimal im- 
portance and 


utility 


neither im- 
portant nor 


useful 


How do you regard 
the use of both a 
multiple-choice 
test of language 
achievement and 

a sample of writing 
in assessing lan- 


guage competence? 


The multiple-choice 
test is satisfactory 
by itself. 


The use of both 


is important. 


The writing sample 
is satisfactory 
DY aL CSe lie. 


Neither is partic- 


ularly satisfactory. 


18) 


69 


28 


26 


12A 


73 


1a, 


1G 


68 


26 


CAAT 


80 


ey 


UNIV 


84 
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If a student's rel- 
ative standing on a 
writing test were 
different from the 
student's relative 
standing on a 
multiple-choice test 
of the type admin- 
istered in this 
study, which would 
you consider the most 
valid measure of the 
student's language 


competence? 15 12A 12G CAAT 


the score on the 


writing test 49 53 56 44 


the score on the 
mult iple-choice 
test 3 0 2 0 


a combined score 

weighted in favour 

of the writing 

test 42 43 oN) 38 


a combined score 

weighted in favour 

of the multiple- 

choice test 3 Z 0 3 


a combined score 
giving equal weight 


to both tests 5) 3 3 15 


A significant percentage responding to these items 


UNIV 


56 


40 


strongly 


objected to the use of multiple-choice tests at all as a measure 


Ld 


of language achievement, the higher proportion of strong negatives 
being found at the secondary panel. As well, a very substantial 
percentage gave only qualified approval of their use (over 40 per 
cent secondary, 61 per cent CAAT, and 76 per cent university). 
Virtually no one regarded a multiple-choice test of lanquage 
achievement as by itself adequate as a measure of student 
competence in using language. In fact, a substantial number saw 
the writing sample as satisfactory by itself (range here was from 
15-28 per cent of responses), and an overwhelming proportion of 
all groups regarded the writing test score as capable of standing 
alone or, if in combination with the multiple-choice test, with 
extra weight given to the writing test. The percentages here 
were; 15==21° per cent; 12A--96 per cent; 12G=-95 percent; 
CAAT--82 per cent; UNIV--96 per cent. 


One rather obvious conclusion should be drawn concerning 
present or future testing of language competence at the senior 
level: secondary school, CAAT, and university teachers of English 
do not think multiple-choice testing valid as the sole indicator 
of achievement; writing must also be tested. At the same time, 
it is reasonable to conclude that many teachers would find 
acceptable a multiple-choice test in combination with writing. 
One evaluation issue, therefore, that needs to be systematically 
addressed is a method for increasing the reliability and validity 


of scoring students' written work. 


Respondents! written comments offered a variety of 
suggestions regarding the multiple-choice format, such as_ the 
addition of some test of oral competence or tests of sentence 
style and structure wherein the student would compose his own 
sentences. A substantial number of respondents asked that 
multiple-choice tests provide for short "essay type" answers to 
questions, though the kinds of questions to be answered in that 


form went unspecified. 
Some criticism was directed against the complicated 


instructions for the "construction shift" items as noted above; a 


number responding for the general level student saw the test as 
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too difficult and "painful" for these students, and rather 
irrelevant as well. Scattered criticisms were made of testing 
conditions, the pressure of time and the guessing factor. An 
infrequent but interesting comment pointed out the total lack of 
context for sentences to be corrected or improved. Standing 
alone, they appeared as purposeless exercises, divorced from 
meaningful communication situations. The absence of context would 
appear to be a constraint on validity, and a factor to consider 


in future test construction. 


the najor specific ‘criticisms, “occurring (with: high 


frequency, were these: 


(1) The lack of any necessary connection between 
performance on these exercises and the way in which the 
student actually uses language in a variety of 


situations. 


(2) The tendency of these tests to overemphasize 
artificial, overly subtle, or debatable features of 
language, with the concurrent tendency to_ stress 
awareness of rules rather than performance. Likewise, 
the tendency to stress errors and error-correction 


rather than presentation of effective written English. 


(3) In providing answer choices (applicabie® indethiss*test 
only to the "sentence correction" items), the failure 
to test the student's ability to construct his own 


sentences. 


On this last point, a substantial number recommended open-ended 
items wherein the student would write corrected sentences from 


scratch rather than be cued by the options. 


The above criticisms strongly suggest that the state of the 
art of multiple-choice language testing is presently 
unsatisfactory. Teachers are evidently not convinced of the 


validity of this very indirect measure of language performance nor 


2? 


are they pleased with the kind of emphasis these tests encourage 


in approaching the study of lanquage. 


2.6 The Essay 


As must already be apparent from the previous group of responses, 
there was a strong consensus that some sample of actual writing, 
principally the essay, ought to be included jin any =test of 
language competence. It was in anticipation of and in agreement 
with this conviction that the Advisory Committee made sure that 
at least a sub-sample of Anglophone students being tested for 
English language competence wrote essays. In the inventory for 
the Writing Test (the essay), respondents were asked to consider 
the single mode (expository) chosen for the essay test: its 


suitability and its importance. 


How many students 
should be able to 
write an acceptable 
essay of this type 
upon entry to 
English courses at 
this level? (Post- 
secondary inventor- 
ies phrased it as 
"entry to your 


postsecondary in- 


stitutions" =) 15 12A 12G L2B). CAAT UNIV 
100% 66 355) 24 2 45 80 
716%-99% IS} 38 16 oY) 30 16 
51%-75% 9 23 56) Ha) 15 4 
26%-50 % uf 1 7 0 0 0 
1%-25% 1 1 v iy 0 0 
none 0 0 0 0 0 0 
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How much emphasis 
do you give in your 
teaching to the de- 
velopment of stu- 
dent competence in 
this mode of writ- 
ing? (For post- 
secondary: 

"in your teaching 
of first year 


courses". ) 


heavy 
moderately heavy 
light 


How many of the stu- 
dents who success- 
fully complete En- 
glish courses at 

this level should be 
able to write an ac- 
ceptable essay of 

thes type required?’ in 
the test? (For post- 
secondary: "the 
first year English 


course you teach". ) 


100% 
716%-99% 
S1lm-75% 
26%-50% 
1%-25% 


none 


iS) 
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I: 


31 


EZR 


45 


32 
46 


12G 


12B CAAT CAAT UNIV 
REG eM. 


28 Be; 73 44 
33 48 i 22 
0 0 fi 0 


12B CAAT CAAT UNIV 
REG. REM: 


ey 47 47 88 
67 a7 20 IZ 


LE 16 7 0 
8) 0 20 ) 
0 0 0 

0 0 6 0 


The Advisory Committee's judgement that this mode of 
writing is important in Grade Twelve and Grade Thirteen English 
at all levels was well supported. 90 per cent of the responses 
For Grade Thirteen indicated moderately heavy to heavy emphasis, 
97 per cent for Grade Twelve advanced, 78 per cent for Grade 
Twelve general, 91 per cent (admittedly of a very small number 
of responses) for Grade Twelve basic with competence expected by 
the end of the program at any level by 75 per cent or more of 
the students according to the large majority of the respondents, 
and by all those responding for universities. It is interesting 
also to note that, according to these responses, this mode of 


writing receives quite heavy emphasis in postsecondary programs. 


Are there sother= modes 

of writing which from 

the standpoint of gen- 

eral literacy are as 

important as, or more 

important than, the 

mode being examined? 1 12A V2.6 128. CAAT UNIV 


No 50 oe) 48 -- 48 54 


Was the present as- 

Signment at a reas- 

onable level of 

difficulty for stu- 

dents in courses 

at this level? 13 12A 12G 128 CAAT UNIV 


Yes 94 a2, 86 jie 94 100 


Dike 


Was the restriction 
to a single mode 
fair to students in 
a test of writing 


competence? 13 12A 12G LZBP ECAR RUN LY 


Yes 81 78 71 62 66 ae 


Given the restriction 
to a single mode, 
what is your opinion 


of the range of 


topics? ie 12A 12G CAAT UNIV 
Good 58 55 47 38 40 
Satisfactory 36 40 40 D3 48 
Unsatisfactory 6 5 LD. y) LZ 


Teachers were strongly agreed that the difficulty level was 
reasonable and that the topics selected and the range of topics 
were satisfactory. There was some criticism of the restriction to 
a single mode on the grounds that not all _ students could 
demonstrate their best writing when confined to this mode, but, 
conceding that limitation, it does appear clear that, of the 
modes that might have been selected, the choice made was the 


soundest one. 


When asked in a further question whether there were other 
modes which, from the standpoint of general literacy, were as 
important as, or more important than, the mode selected, between 
48 and 58 per cent across the programs and panels represented 
agreed there were not. Those who disagreed gave a very wide range 
of responses, including the following: the essay in literature; 


reports; letters; personal, creative or "free" writing; dialogue 
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or drama; description; narration; poetry; reviews; and the formal 


essay. 


Of these, narration and description received the most 
frequent mention but no particular response occurred more than 15 
per cent of the time, with two exceptions: for business 
communications students at the Grade Twelve general and CAAT 
levels, letter writing was strongly recommended; and for CAAT 
students generally, informative descriptions of processes and 


proceedings were frequently suggested. 


Teachers were asked to rank in order of importance five 
general criteria for evaluating the essay, and to suggest any 
others they considered of high importance. The five criteria 
were: organization; logic, use of evidence; style (chiefly 
sentence style); grammar, usage, mechanics; and diction. Some 
respondents regarded the question as pointless, claiming either 
that the criteria given were too general or that all were of 
equal importance. Some, especially among university respondents, 
gave all five a "1" (high) rating. Consequently, all that comes 
clear from responses to this item is the strong priority given 
organization and logical argument over the other general criteria 
across all grades and levels. It might be noted also that 
“grammar usage and mechanics" were given priority over "style" at 


the postsecondary level. 


In suggesting additional criteria, many teachers reinforced 
the importance of organization by specifying it further in such 
terms as "coherence", “effective introduction and conclusion", 
etc. A wide variety of other criteria were mentioned but, with 
One exception, with very low’ frequency. The one additional 
criterion which occurred with high frequency (perhaps in 25 to 35 
per cent of responses) and in many guises is perhaps best 
deseribed €las®9) “ereativity'4. Evidently, many teachers’ regard 
"originality"; “vitality", "imaginativeness", etc. as quite as 


important as more formal and more easily defined criteria. 
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It should be noted that the essays were in fact scored 
"holistically" (i.e. by general impression) rather than against a 
set of formal criteria. The tendency of teachers to give major 
characteristics approximately equal weight provides some support 


for the procedure employed. 


For the holistic scoring process, scorers were not provided 
with a set of criteria to apply. However, following a practice 
run on a small batch of essays, all scorers were asked to 
identify in rank order the five or so criteria they found 
themselves applying. The two criteria most frequently noted were 
organization (almost always in the first or second rank), and 
logic or development of argument (most frequently in the second 
rank). "Grammar, usage, mechanics" was the criterion noted third 
most frequently, and ranked on the average fourth. Creativity and 
its approximate synonyms also occurred quite frequently, as did 
diction. In short, the criteria and their ranking as applied by 
the scorers appear to correlate closely with the views of the 


teachers as expressed in the inventory. 


Following the holistic scoring, a sample of the essays was 
studied for error counts (grammar, usage, mechanics, etc.) and 
further characterized from the standpoint of these other more 


general characteristics. 


In various contexts throughout the inventory, suggestions 
for ways of increasing the validity of measuring lanquage 
competence were solicited from respondents. Some, though the 
number was not large (about 10 per cent), recommended increased 
frequency of testing, an increased battery of tests, or both. A 
substantial number recommended that oral lanquage be tested in 
some manner, though there were few specific suggestions. Very few 
indeed recommended tests of formal grammar. A significant number 
suggested the précis and the appreciation, no doubt in recall of 


the external examinations in Grade Thirteen. 


bi 


The dominant response, however, was the recommendation of 
more frequent samples of writing in a mixture of modes. This 
response came forward again in even stronger terms from those 


making concluding or summary remarks. 


Here are some typical comments underlining the importance 


of -thestesting of writing: 


UA Baars test eof si wra.tangae (bub menace: eusara 
need for a number of samples in various 


modes over time." 


‘hes Wire binge se site «1 Saef aeuperiorertio 
multiple-choice where there is much 
guessing, no application, little sense of 


accomplishment. " 


"The essay is most valid because it measures 


organic relationships." 


"The writing test is the most effective way 
of evaluating the student--but problems in 


Subjectivity. 


There were a large number of somewhat hostile or negative 
comments from secondary school teachers, few from postsecondary. 
Apart from those comments directed against multiple-choice tests 
on principle and those noting the difficulty of the test battery 
for general and basic level students particularly, most centred on 
the timing (the point in the school year) of the tests, the lack 
of adequate time for preparation on the part of staff and 
students, the impersonality and apparent lack of direct value of 
these tests to the students taking them, or insufficient time for 
students to complete the tests adequately. While this sort of 
criticism appears fair enough, it is somewhat beside the point of 


the testing issues to be considered. 
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On the other hand, there were numerous favourable comments 
about both the multiple-choice and essay tests with 
recommendations for improvement. In particular, there appeared to 


be considerable support for formal testing at the senior level in 


English. 


2./ Conclusions 


Generally, appraisers across all levels offered strong support for 
the use of multiple-choice tests as measures of both reading and 
language competence, provided that these tests be supplemented by 
a sample--or, preferably, samples--of student writing. Appraisers 
felt students should be given the opportunity to write in more 
than one mode, but that the "point of view" essay was the most 
important single mode at the interface level. Further, the 
principal criteria for evaluating this mode of writing at this 


level were organization, logic, and "creativity". 


Regarding the specifics of multiple-choice language tests, 
appraisers offered critical comments regarding test conditions, 
wording of instructions, absence of and emphasis on errors in 
certain item types and a general lack of opportunity for students 


to create their own sentences. 


TECHNICAL ISSUES 


Oye wEOCOran the Test of Readin Comprehension and _ Language 


Achievement (English) 


Both forms of this multiple-choice instrument were scored to 
correct for the effect of guessing. The rule that was employed in 
this correction was as follows: one mark was awarded for each 
correct answer, one-fourth of a mark was deducted for each 
incorrect answer, and all questions for which no answer was given 


were ignored. 
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3.2 Scoring the Writing Test 


The essays were to be appraised holistically by three independent 
scorers on a scale of 1 (lowest) to 10 (highest), whole numbers 
to be used. As can be seen in the “Instructions to Scorers" 
(presented as Appendix A1A of this technical report), holistic 
scoring, .as described in Britton (1966) and NAEP (1972), is a 
rapid impression score: the total effect of the essay on the 


reader. No sub-scores on specific criteria are requested. 


Fach scorer was asked to read a number of the essays in 
order to establish use of the scale in a preliminary way and then 
to make an early adjustment if he found that he was not using the 
total scale effectively. Our only insistence was that each scorer 
use every category of the scale: some essays had to be rated 
"1", some "10" and so on. We did not insist or request that the 
essays be equally distributed across the scale. Thus essays were 
appraised in relation to. each \.other. ‘rather’ than’ ‘against’ an 


external criterion. 


Consistent with this method, direction to scorers was kept 
to a minimum (see Appendix AlA). Each scorer was given a copy of 
the instructions to students, a copy of the essay topics, and a 
general instruction (#3) to appraise from a positive standpoint 


and in the context of the mode of writing assigned. 


3.3 Recruitment of Scorers 
It was established that, inclusive of the 50 essays to be scored 
by everyone (see below), no scorer should receive more than 300 
essays for holistic scoring. Knowing in advance the approximate 
number of students scheduled to write the essay, we determined 
that 36 scorers would be required, and we decided that each essay 
should be scored by one postsecondary English instructor and by 
two secondary school teachers of senior English. Therefore we 
sought 12 university/CAAT scorers and 24 secondary’ school 


scorers. 
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As testing was scheduled for late May and the essays had to 
be collected, xeroxed and packaged before they were sent to the 
scorers, we calculated that scoring could not occur until the 
second half of June and we were fearful that good scorers might 
Hew searce’ Jat. that, tame» uof «the, school... year. . But), with the 
assistance of the executive of the Ontario Council of Teachers of 
English, we were able to identify a pool of about 125 potential 
scorers, most of .whom, when contacted by letter, expressed 


willingness to assist. 


The selection of the twelve postsecondary scorers was 
relatively simple, as there were fewer replies here; we decided 
to use 6 from CAATs and 6 from universities and followed the 
principle that one or at most two from each of institutions as 


widely separated geographically as possible would be enlisted. 


In drawing the remaining 24 (secondary school) scorers, we 
split the respondents into those with more than ten years' 
secondary school teaching experience and those with ten or fewer 
years' experience, using twelve from each group. The twelve then 
were drawn having regard for geographical distribution so that 
scorers would be widely spread geographically and so that there 


would be representation from both urban and rural areas. 


With the exception of the 50 essays which were assessed Dy 
all scorers, each essay was scored by three persons: one 
instructor in either a CAAT or a university; and two secondary 
school teachers, one with more than ten years' experience, and 


one with ten or fewer years' experience at the secondary level. 


3.4 Preparation of Scorers 


Several weeks prior to their receipt of the Interface essays, 
scorers were sent fourteen essays of varying quality from Grade 
Twelve or Thirteen students and from CAAT students, together 


with an explanation of holistic scoring, the background of the 


current project, and instructions for the warm-up exercise using 


the fourteen essays. 


With their scoring sheets for the warm-up exercise, scorers 
were asked to return a list of the criteria they considered most 
important in judging this sort of writing, and an indication of 
the relative priority they attributed to each of those criteria. 
These data were later used in approaching the appraisal of essays 
for positive characteristics but were definitely not used in 
instructing scorers in the use of specific criteria for the 


holistic scoring of Interface essays. 


The chief purposes of this warm-up run were as follows: to 
familiarize scorers with the holistic scoring method and to 
provide further direction where needed respecting use of the whole 
scale; to anticipate questions or problems so that these would 
not interfere with the rhythm of the scoring of the Interface 
essays; and to stand in place of a formal training session which, 
given the spread of scorers around the province, would have been 
very “costly: (Notes "our “net” "ofa score peliaoriity of -Onw 45 
compares very favourably with similar endeavours in holistic 
scoring with three scorers, and suggests that a formal training 
session with all scorers gathered in a single location is 


unnecessary. ) 


3.5 Main Essay Marking Tasks 
The main task of marking some 1600 essays was done Following the 
same basic procedure used in the warm-up exercise. Fach marker 
received a stack of approximately 210 essays. These included the 
29U essays that were scored by all markers. The way in which the 
90 were chosen and the different stacks of essays were prepared 
for the scorers is described in the main report (see Chapter Two, 
Part vA, subsection 336). “ihe reason tor having ou essays scored 
by all markers was to implement a special essay scoring 


procedure. The necessity for this procedure is explained in 
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Chapter Two, Part A, subsection 4.2; the procedure itself is 


described in Appendix D2. 


3.6 Rescoring the Fifty Essays 


We had not anticipated an opportunity to have the 50 essays 
re-scored, as we feared that intrusion of summer holidays would 
force many potential scorers to refuse our request to do the 
initial June scoring if we added the rider that a further scoring 
in July was entailed. ilowever, circumstances later encouraged us 
to request the additional scoring. For one thing, response to the 
recruitment program had been surprisingly enthusiastic; for 
another, since a substantial number of students who were expected 
fo, write: the essay did mot (in fack do so, each scorer had 
received fewer-essays than the 3U9 he had agreed to score. Hence 
our request for additional scoring went out, and twenty-eight of 
the thirty-six scorers expressed a willingness to do the job. 
(That it was, in fact, a re-scoring job was made clear to the 


twenty-eight in July, before their task began.) 


The re-scoring of the fifty essays provided us_~ with 
important additional information concerning the reliability of 
marking. A report on the reliability of the essay scoring is 


provided later in this technical report (see subsection 3.11). 


de LADS faculty 


ihe, difficulty ef items,.in. each, fonm oafivethes, lest of Reading 
Comprehension and Language Achievement (English) was estimated 
for the population of Anglophone’ students as_ follows: the 
percentage, of correct responses to an item was deterinined 
separately for each school in the sample and the resulting 
percentayes were averaged over all the schools. The resulting 
averaged, percentages are referred, to as indices of 
difficulty--really they are indices of easiness but conventional 


usage is followed here. Ihe indices for all the items in boti 
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forms are reported in Table Al.1. Separate indices are provided 
for SSGD and SSHGD students. 


The results reported in Table Al.1 clearly support two 


conclusions: 


(a) The items in the tests were less difficult for SSHGOD 
Students than for SSGD students. This finding is not 
unexpected, however, because the SSHGD students had 
had a year more schooling on average than the SSGD 
students and they comprised a more select group of 
students: the SSGD group consisted of both general and 
advanced level students whereas the SSHGD group, Dy 


definition, consisted only of advanced level students. 


(b>) Form 1 of the test was somewhat easier than Form 2. A 
base line is needed against which to evaluate these 
results on difficulty. One such baseline is the figure 
of 60 per cent. This figure is the midpoint between 29 
per cent, the percentage of correct answers that would 
be expected on a five-option multiple-choice question 
if responses were entirely random, and 100 per cent. 
Items with difficulty indices at or below the level of 
chance performance seem inappropriate for obvious 
reasons, whereas items with difficulty indices near 
100 per cent fail to discriminate among examinees. In 
tests designed to spread students over the range of 
scores on the test, items having difficulty indices in 
the middle of the difficulty range, in this case near 


60 per cent, are most valuable. 


Wnen the average difficulty of items in each test form is viewed 
from the foregoing perspective, it would seem that both forms 
were on the difficult side, although Form 1 came close to being 
ideal for SSHGD students. In addition, both forms contained items 
having difficulty indices of less than 20 per cent--items of 


somewhat questionable value in testing SSGD and SSHGD students. 
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3.8 Speededness 


Botn forms of the Test of Reading Comprehension and_Lanquage 
Achievement (English) were divided into three sections. These 
sections were not separately timed in any strict sense, although 
students were told at the end of the first 10 minutes of the 
testing period to go on to the second part of the test if they 
had not already reached that part, and after 20 minutes they were 
colLdsGtom go: (omy to) thes third ‘parbeofc thesitest rf M they” had onat 
already reached it. Because of this, it seems reasonable to look 
for evidence of speeding at three points in the test--the end of 


Part ‘One, the end of Part Two; and the end of the test. 


A test is speeded, by definition, if students do not have 
enough time to respond to all the questions it contains. Because 
work rates differ, it is impractical to allow sufficient time for 
all students to answer every item. One rule of thumb that is 
sometimes used for judging the speededness of a test is as 
follows: a test is said to be speeded if less than 190 per cent 
of the students reach the three-quarter mark in the test and if 


tess ethan GOP percent of the students ‘complete ‘the vtest. 


As evidence of speededness one looks for ..a sign that 
students have had insufficient time to answer the questions in a 
test. This means looking at the percentage of students who have 
frauledpea towtpespond Sto anv iteme? "Amo arbitrary distinctien is 
sometimes made, when failures to respond are being considered, 
oetween an omitted item and an item that was not reached. An 
item is said not to have been reached if the student fails to 
respond to it and to every other item that follows it in the 
test. Otherwise, the item is declared to have been omitted. 
Accepting this distinction, the percentage of students who failed 
to respond because they did not reach an item is the index that 
should be used to judge the speededness of a test. In _ the 
analysis of responses to the Test of Reading Comprehension and 
Language Achievement (English), the percentage of students who 
"did not reach" is a useful index of the speededness of the third 


part of the test. But, because a student always responded to 
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items appearing later in the test than items in Parts One ‘and 
Two, it is not possible to compute the percentage of students who 
did not reach items in these parts of the test. Instead, evidence 
of speededness in Parts One and Two must be sought in the 
percentage of students omitting an item. If this percentage 
increases dramatically over the last few items in Part One or 


Part Two of the test, then the test was Clearly speeded. 


The percentage of students who omitted and who did not 
reach an item in each form of the Test _of Reading Comprehension 
and Language Achievement (English) is reported in Table Al.2. 
These percentages were computed in the same way as the difficulty 


indiees-reported im (able wal. i. 


Consider the evidence for Part One of both test forms. The 
division between Part One and Part Two occurred in each form 
between items 10 and 1l. The build-up of omissions from about 
item 6 through item 10 is very apparent. This build-up, when 
assessed using the rule of thumb stated earlier, suggests. that 
Part One of each form was speeded for both groups of students. 
Because the build-up in omissions is greater and more consistent 
for Part One of Form 2 than for Part One of Form 1, it seems 


that this part in Form 2 was more speeded than in Form 1. 


There. is. evidence. that Part Two of each form was also 
speeded. The point of division between Part Two and Part Three 
occurred in the test between items 21 and 22. The build-up in 
omissions is even greater for Part Two of each form than for Part 
One. The limits specified in the rule of thumb for speededness 
are again exceeded. Clearly, the second part of both forms was 
highly speeded, and, once again, Form 2 seems to have been more 


highly speeded than Form 1. 


Thes.speededness».ofiapthe, third «part of each form canbe 
judged from the "not reached" percentages. These are relatively 
low regardless of the form being considered. The limits specified 
in the rule of thumb concerning speededness are either completely 


satisfied or very nearly so. It would be fair to conclude that 
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the third part of each test was not speeded to any significant 


degree. 


Reasons for the apparent speededness of the second part of 
each form may be found in the criticisms of the “construction 
shift" type of item voiced by many teachers in their test 
appraisal inventory responses. Teachers themselves found the 
instructions for this item type very confusing and feared that 
students would also be confused and have their performances 
affected detrimentally. Unfortunately as well, some students 
taking the tests had insufficient time beforehand with the Student 
Handbook, where the item type was explained and practice items 
supplied. The evidence of speededness shows that the test, as a 
measure of language competence, did not work as well as it should 
have. Care should be taken in future administrations of this kind 
GOM@EeSe, sparlicularly if" the” construction shift™ type of “item” is 
involved, that ample opportunity be provided for students and 
teachers to become familiar beforehand with the test's style and 


purposes. 


Deve Ltemepiserimination 

The biserial correlation between scores on an item--the item is 
scored “Ol hor corrects + andy Uerorswrongironretedy ometnot 
reached--and scores on the total test provides a crude index of 
discrimination. In tests designed to spread students over the 
range of possible scores on the test, the biserial correlation or 
index of discrimination for a item should be relatively high, say 


USohor higher: 


Indices of discrimination were computed for each item in 
each form of the Test of Reading Comprehension and_Lanquage 
Achievement (English) using the responses of two groups’ of 
students. These groups were subsamples of the total sample of 
Anglophone SSGD and SSHGD students tested in the study. Each 
group took both forms of the test, but one group took them in the 


order > Porm *))) —OF orm? J) “dnd “the? other took ethem “im ithe* reverse 
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order. Indices of discrimination are reported for the first form 
taken by a group. It should be noted as well that both groups 
were composed of both SSHGD and SSGD students. The two groups 
were formed at random and because they were drawn from the same 


pool of subjects, they can be said to be randomly equivalent. 


The indices of discrimination are reported win VlablestAde 3. 
Note that the indices are above 0.3 for all items except three, 
two, 1 nmormackevand sonewinalhormas2e<tlnn wcewor these results, it 
can be concluded that the items in both forms had acceptably high 


indices of discrimination. 


5.10 Distribution and Reliability Statistics 


Further information about the two forms of the Test of Reading 
Comprehension and Language Achievement (English) is provided by 
thet, statistics, \4repontedy)iny ehableoes Ae. Gaumand hh a5 These 
statistics were computed from the test responses of the two 
groups of students described in the previous subsection on item 
discrimination. Although both these groups of students took both 
forms of the test, they took them in different orders. The 
Statistics that are reported are for the form of the test that 


was taken first. 


From the mean scores reported in Table Alp. howe lb iaisaetiean 
that Form 1 was the easier of the two. Although both forms were 
somewhat difficult for these students--the means of scores 
corrected for guessing are less than one-half the number of items 
in the forms--they were not excessively difficult, as evidenced 


by the presence of perfect scores on both forms. 


An internal consistency measure of reliability and the 
corresponding standard error of measurement of each form are 
reported in Table Al.5. These statistics provide information 
about the stability of the score achieved by) animindividual 
student. The reliability coefficients are lower than the level of 


coefficient that can be achieved with tests of this sort. This 
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seems to be due in large measure to the fact that the content of 
the test was heterogeneous. The test was purposely made 
heterogeneous in order to assess several different components of 
language achievement, but as a consequence, the correlations 
among the three parts of the test are only moderately high (see 
Table Al.5). This has the effect of reducing overall test 
reliability. 


Is the level of reliability so low that it will invalidate 
the results. of this study? The answer to this question is an 
unequivocal, No! The’ test. forms have an -acceptable level . of 
reliability for making comparisons among groups and for use as 
variables in a regression study. These are the two uses to which 


the instruments were put in the present work. 


Seldekeltapllitywoty Essayrocoring 


The plan that was followed in scoring the Writing Test made it 
possible to estimate a coefficient of reliability and a standard 
error of measurement for the scores assigned to the essays. 
Recall that a subset of 50 essays was drawn at random from the 
Cotalmset: of essays ‘and that veach of theeSé6é markers. pofeethe 
ratings Test) imankedsithetessays: in thisitsubser... Whree weeks vafter 
the full set of essays had been marked, the subset of 50 essays 
was sent to the markers for rescoring. Eight markers were unable 
to score the essays a second time, and so in the end two sets of 
marks on the 50 essays in the subset were obtained from each of 


23 SCORELSs 


The procedure that was used to estimate the reliability and 
the standard error of measurement of the essay scores involves the 
application of analysis of variance to the scores assigned by the 
28 markers to the 50 essays on the two occasions. This procedure 
is described in detail by Cronbach, Gleser, Nanda and Rajaratnam 
(197 250 pps! 942-4450 86-90,,1-97=99). It consists of ‘estimating the 
component of variance associated with each of the factors in the 


design of the experiment for collecting the scoring data and using 


47 


these components to compute an intraclass correlation coefficient 
and the square root of an error variance. These are respectively 
the estimates of reliability and the standard error of 


measurement . 


The results of the analysis of variance are summarized in 
Table Al.6. Note that the largest component of variance, as one 
would expect, is for essays. This indicates that the largest 
portion of the variation in essay scores is attributable to 
differences in the quality of essays. The other relatively large 
components of variance are for the following sources: markers, 
reflecting differences among markers in standards; the interaction 
between essays and markers, reflecting the fact that the score a 
marker assigns to an essay is not wholly explainable in terms of 
the extent to which that particular marker is a more (or less) 
severe marker than the average marker and that particular essay is 
a more (or less) superior essay than the average essay; and the 
residual, reflecting error variance and variance due to the 
interaction among markers, essays and occasions. The estimate of 
the component of variance for occasions was negative, a result 
that is impossible in theory, but obviously possible in practice. 
For this reason, it was set to zero, as recommended by Cronbach 
let Val SiC Zee oe Sie 


The reliability of the essay scoring was estimated from the 
components of variance reported in Table Al.6. All sources of 
variance except for essays were treated as sources of error. In 
addition, it was assumed that each essay would be scored by three 
different markers and that the final score assigned to an essay 
would be the average of the three scores it was given by the 
three tecorers.* 1] hesteffee eo kmetnas assumption was to reduce the 
contribution to error of the components of variance due to markers 
and to the interactions between markers and other sources to only 
one-third of the values reported in Table Al.6. The residual 
component of variance was similarly reduced to one-third the 
tabled value. A coefficient of reliability is estimated as the 
ratio of the component of variance for essays *to the Wsuneor “the 


components of variance due to essays and to the other sources (as 
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reduced in the way described earlier in this paragraph). The 
estimate of reliability is 0.74. The associated standard error of 
measurement is the square root of the sum of the components of 


variance attributable to error; this number is 0.90. 


The reliability coefficient and standard error of 
measurement can be » interpreted as follows: the reliability 
coefficient is an estimate of the correlation that can be expected 
between the "observed" scores for essays, where the "observed" 
score for an essay is the mean of the scores assigned on just one 
occasion by three different markers chosen at random from the 
population of markers, and the "true" ("universe") scores for the 
Same essays, where the "true" score for an essay is the mean of 
the scores the essay would receive if it were graded just once by 
every marker in the ‘population of markers. (This population can 
be thought of as that large group of persons qualified to mark 
English essays and from which the individuals who did mark the 
essays were drawn at random.) The standard error of measurement 
is interpretable as the standard deviation of the observed scores 
an essay would receive if it were to be marked by many different 
groups of three markers, each group chosen at random from the 
population of markers. If it is assumed that such a distribution 
of observed scores as this for an essay is approximately normal, 
then the interval obtained by first adding the standard error of 
measurement to the observed score for an essay, and_ then 
subtracting it from the observed score is that interval within 
which the true score of the same essay would be expected to fall 
with probability 2/7 5°. 


Is a coefficient of reliability of 0.74 suitably large, and 
a standard error of measurement of 0.90 suitably small? The 
answer to this question depends on the use to which the essay 
scores are put. Given this standard error of measurement, we 
should not have great confidence in declaring that the higher 
scoring student of two students who achieve closely similar scores 
is the better writer. In other words, in comparing individuals 
achieving similar scores on this test, the risk of being wrong in 


judgragggonéecor «the; otherojtoesbey.the better writer would be 
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Telatively high. In the present instance, however, the essay 
scores were used (i) to make comparisons among different groups of 
students, and (ii) as predictors of school marks. The obtained 
reliability “of” the )éssay* \sconés@ Tis ™) satisfactory foriiithese 


purposes. 
Four additional qualifying remarks are in order. 


(a) The estimate of reliability that has been reported is 
only°van*testimate! of athe *reliabilaty: of ~scoringa ait 
reflects nothing about the variation that might be 
found to exist in essay performance if the same 
students were asked to write more than one essay. The 
quality of a student's writing is known to vary as a 
function of such factors as the mode of the writing 
assignment, the choice of topics offered, motivation to 
do well and the student's state of mind. Because of 
these factors, the correlations that one might expect 
to observe between the scores achieved by the same 
students on different essays are almost certainly less 
than 0.74. And yet it is also true that the more 
samples of a student's writing that are collected and 
scored, the better the basis one has for judging his 
ability. Even though students' scores on two essays do 
not necessarily correlate highly, both essays together 
provide a better measure of writing ability than does 


either one taken alone. 


(b) The evidence on reliability that has been presented is 
applicable to the case where the observed score 
assigned to an essay is the simple average of the 
scores given the essay by three different markers. A 
Simple average was not used in this study; instead. the 
three scores that an essay received were differentially 
weighted to form a weighted average (see Appendix D2). 
This being the case, the results reported here do not 
strictly) “dapply*to) @the Yessaysqvas they ® were “finally 


scored. But these results must be a close approximation 
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(c) 


(d) 


to the true results. (Note that the correlation between 
the simple averages and weighted averages for the 
English essays was 0.97, and the variances of both the 
simple and the weighted averages were identical to the 


second decimal place.) 


The effect of handwriting on the marking was_ not 
controlled by having the essays typed. Even a cursory 
glance at the essays provides convincing evidence that 
the quality of penmanship varies greatly from student 
to student. Just how much the markers were influenced 
by quality of handwriting is not known, but’ the 
contribution “of this factor tolpthe tunreliababity” of the 


marking is almost certain to have been considerable. 


The single number that is reported as the standard 
error of measurement of the essay marking does not 
reflect the fact that markers disagreed more over some 
essays than over others, and that, in consequence, the 
scores assigned some essays were more variable than the 
scores assigned others. It is a question of some 
interest why markers achieve a greater consensus over 
one piece of work than over another. Perhaps, as James 
Britton has indicated (personal communication), the 
originality or unpopularity of ideas contained in an 
essay, and the informality of the tone of an essay are 
factors that divide markers and cause scores to vary 
widely. Subjective impressions formed during’ the 
conduct of this study suggest that topic and mode may 
also be related to variability of marking. Systematic 
study of the influence of these factors on variability 
of essay marking is called for to see whether this is 


an avenue to improved reliability of essay scoring. 
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3.12 The Essays and the Scale 


A. Introduction 


While holistic scoring provides the most reliable method of 
assessing essays in relation to each other, the scale itself is 
arbitrary; scores provide no characterization of the writing 


quality beyond "better than..." or "worse than...". 


It was considered important, therefore, to characterize the 
writing across the scale, as far as was reasonable, with 


reference to such questions as the following: 


(a) What sorts of errors were students making, and with 


what frequency? 


(>) To what extent could frequency of error be seen as 


influencing the scores given the essays? 


(c) What positive characteristics (of style, organization, 
effectiveness «of «argument, etc.) did the essays 


ex Diibaibee 


(d) What proportion of the essays could be considered to 
meet a standard of literacy acceptable for entry rte 


postsecondary institutions? 


For the error count and the assessment for postsecondary 
entrance standard of the essays, the 50 6SSays, \scaredueby mada 
scorers were used, as these had been randomly drawn initially and 
their scores had proven to be of satisfactory reliability. Thus, 
although they did not happen to distribute themselves evenly 
across the scale, they served most reliably to reflect the 


meaning of scores near those points where their scores fell. 
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For the appraisal of other writing characteristics, these 
90 essays were supplemented by an additional 50 drawn at random 
from each decile in the total distribution of essay marks in 
order that we would have a larger sampling of styles and of style 


variation resulting from choice of topic. 


It must be -stressed that all observations and 
generalizations made in the following sections are very tentative, 
first of all because the sample used was not large, and second 
because there is a good deal of subjectivity in what a person 
defanesecas "Welateracy” 7/8" qaod> "style! 7 tete? Pelherenviomeven ! a 
Surprising amount of subjectivity in what is considered an error, 
or a major error as distinct from a minor error, regardless of 
the detail provided in instructions to those counting errors 
(Britton et al., 1966). 


We hope, nevertheless, that these analyses will provide 
some factual basis for drawing conclusions about the present 
quality of student writing at the interface, conclusions somewhat 
different from the general charges, largely unsubstantiated except 
by the occasional quoted "horrible example", appearing almost 
daily in the press. It is worth mentioning too that further 
analysis of the writing, making use of a larger sample, would 
undoubtedly tell us more about the actual situation, and put into 


another perspective the voices of mere opinion. 


There are two other caveats that must be stressed in 
considering the performance on the essay in the Interface study as 


typifying writing performance in Grades Twelve and Thirteen. 


The first is that the students in the study were, for 
practical reasons as described earlier, restricted to a single 
mode: the expository essay developing a point of view. While the 
results from the Test Appraisal Inventory strongly confirmed our 
choice of mode as the best if only one could be chosen, still, it 
is certainly true that not all students write as effectively in 


this mode as they do in others; therefore, ideally, they should 
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have an opportunity to write in at least two modes (Britton ev 
GLs. tel S66nh 0 N82 See 


The second caveat is the problem of motivation. The 
students in the study were "semi-volunteers" for the tests and 
essays, and pressures of time in May did not give some of them 
much chance before writing the tests. to study the Student 
Handbook or to ascertain the purposes of the study. Motivation of 
many was low; there seemed little benefit for the individual in 
the exercise. That this adversely influenced performance (or 
non-performance) was evidenced by the number of absentees, by 
comments from teachers on the Test Appraisal Inventory, and by 
flippant or cynical remarks appended by some students to their 
essays. As well, some essays were rather deliberately off topic 
or beside the point as students chided the tests and the context 
of the testing. We must therefore be careful not to assume that 
the writing generally typifies that which would emerge if the 
students had been highly motivated either intrinsically through 
personal interest in the topics or extrinsically by examination 


marks or other rewards. 


B. Analysis of Errors 


The purposes of the error count were these: 


(a) To obtain a reasonable picture of the sorts of errors 
present in the students' writing and the frequency of 
those errors. The error count was a concrete response 
to the general criticism of student writing at the 
interface. The public is repeatedly being told by 
prestigious persons that "most" or "many" secondary 
school graduates can't write a complete sentence, can't 
spell, write ungrammatically, are "functional 
illiterates", and so on. The error count was conducted 


in an attempt to measure the truth of these charges. 
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(b) To determine the extent to which error frequency of 
various sorts appears to influence the scores given 
essayS in holistic scoring. From the Test Appraisal 
Inventory as well as from independent studies such as 
Paul Diederich's Measuring Growth in English (1974) and 
the just-completed The Queen's English (Norman, 1976), 
it is apparent that satisfactory performance in 
"grammar and  =mechanics", variously described, is 
perceived as an important component of literate 
writing. Often, from the emphasis this factor 
receives, it appears to be considered a conditio sine 


qua_non of acceptable writing. 


Three sources were examined in preparing for the error 
count: the Britton et al. study (1966), the National Assessment 
procedure in their first round of writing assessment (Report # 8, 
for 1972))" and) ‘the Dist of major» and ominor errors! °in! the 
MsStruet rons stommarkers Mofe past Grade* Thirteen’ English 
Composition examinations in Ontario, published repeatedly in S.4C 
and regularly used by teachers in marking writing in the senior 


grades. 


In general we followed the procedure outlined in the 
National Assessment, restricting ourselves, however, to 200 
running words rather than the 300 used in the NAEP, as our 
essays frequently did not run to 300 words. We used two 


"counters" and averaged counts. 


With the exception of paragraphing, we used the National 
Assessment error classifications, but with considerable 
modification to bring the classifications into close alignment 
with the types of errors recognized in the Grade _ Thirteen 
"departmental" examinations. Error classifications and 
instructions to the error counters are presented in Appendix A1B 


of. this"technical report: 


>) 


Paragraphing errors were not included in the count for the 
following four reasons: 200 running words generally do not 
provide a sufficient number of complete paragraphs; the question 
of what constitutes a paragraphing "error" as distinct from an 
inferior or "so-so" paragraph is rather debatable; the National 
Assessment found seriously defective paragraphs a very infrequent 
Occurrence anyway; and research on the nature of the paragraph 
(topic sentence, etc.) in current good writing practice reveals 
much greater variety than standard language texts would imply (see 
Braddock, 1974). 


In the error counting procedure we provided a more elaborate 
Classification of types of error under grammar and diction than is 
reported; however, as total errors per 200 words for specific 
types of grammar and diction errors were quite low, we have 


combined them for ease in assimilating the information. 


The main results of interest are presented in Tables Al.7 
and, SA1i8..0(lLonoreta fou the moment, all the numbers in these 
tables that appear in brackets.) The first of these tables 
contains means and standard deviations. It should be recalled, as 
these results are studied, that the number of spelling errors in 
an essay contributes to the number of errors in conventions for 
tne essay and that the total number of errors for an essay is the 
sum of the number of errors in conventions and the number of 


errors in grammar, sentence structure and diction. 


An important observation that can be made about the results 
presented in Table Al.7 is that, for all the error counts, the 
mean number of errors over the 50 essays is only a very little 
larger in magnitude than the standard deviation. This state of 
affairs reflects the fact that the distributions of error counts 
are positively skewed, with most essays having a number of errors 
less than the mean and a very few essays having a number of 


errors very much larger than the mean. 
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The correlations reported in Table Al.8 are very much in 


line with intuition. Two points at least merit emphasis. 


(a) The global rating of essay quality is negatively 
related to each of the error counts and the 
correlations are moderately large. This suggests that 
the quality of an essay as estimated through the global 
rating procedure used in this study is associated to an 
appreciable extent with the number of errors an essay 
contains. Another way to interpret the correlation of 
=-0.58: "between ‘global rating “of quality’ %and total 
number of errors is as follows: given this correlation 
and the standard deviations of these two variables, the 
slope of the regression of errors on rating can be 
computed. This slope is =2.0, Signifying that for 
every increase in judged essay quality of one mark (on 
the scale from 1 to 10), a corresponding decrease of 


approximately two errors can be expected. 


(b) The various error counts are positively related, 
indicating that students prone to make one kind of 
error (e.g. spelling) are also prone to make other 
kinds of errors (e.g. grammar, sentence structure, 


diction). 


A difficulty arises in interpreting the results presented in 
Table Al.8. The distributions of errors are skewed. The danger 
exists that a small number of extreme values on a variable could 
be responsible for the presence of correlations as high as those 
that were observed. This possibility was put to the test by 
"trimming" the data. The four essays having the largest total 
number of errors were excluded and the analysis was redone. The 
results» “for? | ithe» reduced’ set- “off “46)\"essays*are, ‘reported, in 
parentheses in Tables Al.7 and Al.8. Note that the effect of 
excluding the four essays was to lower the means and the standard 
deviations of the error counts. The drop in _ the _- standard 
deviations for errors in conventions and total number of errors 


was especially large. But despite these changes in means and 
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standard deviations, the pattern of correlations (see the figures 
in parentheses in Table Al.8) was affected relatively little. In 
particular, the moderately strong, negative relationship between 
essay rating and total number of errors was maintained. (The 
Slope of the regression of total errors on essay score after 
"trimming", is -1.2. This result is somewhat more satisfactory 
than the slope of -2.0 obtained for all 50 essays because the 
expected frequency of errors in a essay scored 10 is near O for 
the trimmed regression, not an unrealistic negative quantity as 


for the untrimmed regression. ) 


The question of what constitutes a tolerable frequency of 
error in essays at the senior level, whether these be errors in 
Spelling or major errors in sentence structure, cannot be answered 
from’ these results except ‘arbitrarily 0A frequency of 6 errors in 
sentence structure, grammar and diction in 200 running words (the 
mean for the 11 essays judged to be poorest in the holistic 
scoring) is likely to be viewed as intolerable, while the overall 
average frequency of 3.3 may be viewed as acceptable if ernans, 
let us say, in conventions, are few; or, alternatively, a higher 
frequency of convention errors may be tolerable where other errors 
are few. This appears to be illustrated in the fourth ranked 
essay (average rating 8.0) which has the relatively high frequency 


of 4.5 errors in convention. 


On another hand, however, the essay may be of exceptional 
merit on entirely other grounds so that a higher frequency of 
errors overall is tolerated, as appears to be the case in the 
ninth ranked essay (average rating 6.9, number of errors 8.5) and 
even more dramatically in the essay ranked thirteenth (average 
rating 6.8, total number of errors 12.0). It is apparent from 
the less than perfect correlation between error count and average 
rating, that error frequency is not, by itself, the determinant 


of essay quality. 
It is not safe to go beyond the following fairly obvious 


generalization: there is a moderately strong tendency for essays 


receiving lower scores to have a higher frequency of errors, but 
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it is not consistently the case that low frequency of error is 
reflected in high scores or vice versa. Obviously, there are many 
other characteristics of writing that must be considered in 


assessing "competence" or "acceptability" or "literacy". 


One other observation should be made. The overall mean 
error rate of 7.1 per essay includes all the errors, gross or 
trivial, made in an essay. Further study of the errors reveals 
that, on the average, students made 0.8 major errors per essay 
in sentence structure, and 1.1 grammatical errors, both major and 
minor. If these are regarded as "serious" errors, then, 
obviously, the number of "serious" errors per piece of writing 
was considerably less, on the average, than the total number of 
errors. A decision as to which figure, 7.1 or 1.9 (obtained as 
the sum of 0.8 and 1.1), more truly reflects the writing 
capabilities or the writing incapacities of students requires a 
more or less arbitrary judgment that each reader must make for 


himself. 


C. Features of Organization and Style 


In the foregoing section, our attention was restricted entirely to 
a negative aspect of student writing performance, frequency of 
error. It is evident that, though error frequency is one means of 
characterizing writing, it is by no means the only factor 
contributing to the overall quality of a piece of writing. There 
are many other more positive characteristics of writing that must 


also be considered. 


Obviously, in discussing "organization", "effectiveness of 
ALGUIEHtiome LOCI sei laits Ose Style!) We abe, concerning 
ourselves with matters of judgement not particularly amenable to 
quantification. But it does not follow that these aspects of 
writing should therefore be ignored; otherwise a crude and most 


unfortunate bias would be built into this report. 
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In order to provide some useful generalizations concerning 
the quality of writing as seen from these less well defined 
standpoints, our sample of 50 essays, for which we had highly 
reliable scores, was expanded to one hundred, the others being 
drawn in equal numbers from each decile in the distribution of 
essay scores. The scores on these additional 50 essays were based 
on only three scorers' ratings rather than ChiVrivast 
consequently, the original 50 will be normally drawn upon as 


exemplifying writing at different points of the scale. 


Generalization from rather loosely defined characteristics 
of writing in only a selected sample of 100 essays, or even in a 
sample as large as our total number of essays (approx imately 
1600), lacks a good deal in scientific rigour; we admit and 
regret this, and would recommend a more sophisticated analysis of 
a much larger sample. Nevertheless, the essays written for the 
study remain a fund of important information on student writing 
performance at the interface and can serve as useful baseline data 


in future studies. 


The principal resource drawn upon in this study is a_ study 
of the scoring of writing that was done in 1961 for ‘the 
Educational Testing Service by J. French,» o. Carlton’ tandmers 
Diederich as reported in Diederich (1974, pp = LOY S556) 5 in 
that 1961 study 300 essays were scored by sixty readers in six 
occupational fields by a method resembling our holistic scoring 


procedure in the present study. Subsequently, 


"this table (of scores) was subjected to...'factor 
analysis' which has the effect of picking out clusters of 
readers from all over the table who agree within their 
cluster and disagree with every other cluster to a greater 
degree than could be attributed to chance. In efifeebo eit 
determines how many different schools of thought exist 


among the readers as to what constitutes excellence in 
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Student writing. In this study we found five different 


schools of thought..." 


Diederich (1974, p.6) 


Further analysis, by examination of the comments of these 
Clusters of readers, revealed that the scorers in the largest 
cluster were most influenced by the ideas expressed: richness, 
soundness, clarity, development, and relevance to the topic and 
to the writer's purpose. The remaining clusters, in descending 
order of size, were concerned with usage; sentence structure; 
punctuation and spelling; organization; diction--the quality of 
wording or phrasing; and flavour or style--with reference to the 


personal qualities revealed in the writing. 


In anticipation of the present analysis, we checked with 
two groups of participants in the study--the thirty-six scorers 
and the teachers who answered the test appraisal inventory--the 
priorities which concerned them in the writing of students at 
this level. 


Following a trial scoring run, prior to their receipt of 
the Interface essays, scorers were asked to list the criteria 
most important to them. Their replies--in order of priority--may 


de summarized as follows: 
(a) Organization 
(b) Mechanics: usage, sentence structure, punctuation, etc. 
(c) Presentation of Argument: validity, evidence, balance, 
etc. 
(This criterion was in part dictated by the mode 


assigned: an expository essay presenting a point of 


view.) 
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(d) Writer Commitment: involvement, writer's own interest 


(seemingly related to Diederich's "flavour") 
(e) Style and Diction 


In the Test Appraisal Inventory, teachers, secondary and 
postsecondary, were asked to place in order of priority the 
following criteria as applied to the mode of writing assigned to 


the students: 


Organization; Logic, use of evidence; Style (chiefly the 


sentence; Grammar, usage, mechanics; Diction 


Responses, when averaged over appraisers, placed 
Urganization strongly in the first position, followed byerLaqic 
(the '"mode-specific" criterion). Secondary school teachers placed 
style third and Grammar, etc. fourth, followed by, . Dictions 
Postsecondary teachers reversed the order of Style and Grammar. 
Though many additional criteria were suggested, most fell loosely 


under one or other of the foregoing classifications. 


Afurtheraverificat ion of ‘the SLONLE teancey Ofna aud 
Diederich's "factors" is found in the former Grade Thirteen 


departmental examinations' criteria for the "Middle First": 


The middle first is excellent in content and style. Some 


of its merits are comprehensive and intimate knowledge of 
subject matter (IDEAS), freshness of thought (FLAVGUR), 


skill in arrangement (ORGANIZATION), command of sentence 


structure, and consciousness of word values (DICTION). 


S4C - Underlining and bracketed remarks 


are ours. 
Finally, and independently of the present study, the Queen's 


University report on student literacy (Norman, 1976) has recent ly 


been made available. There (pp. 72-73) eleven "basic components 
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of literacy" are suggested. Four of the eleven "components", 
ranked as #5, #6, #7 and #10, concern effective presentation of 
argument: logic, detachment or objectivity, respect for facts, 
effective generalization. Others which © strongly echo the 
priorities that appear to be established from our searches are 


(rank is Norman's): 


2. The ability to write an assignment that is well 


organized... 


4. The ability to write grammatically, in a style which 
rs sree? of Save put eninor "errors oT “spelling, 
punctuation, grammar and vocabulary, and which makes 


proper use of English idiom. 


1. The ability to write sentences which follow logically 
and meaningfully from one another, and paragraphs which 
are coherent and unified. (Appearing to combine style 
(re sentence structure) and organization again (re 


paragraph)) 


No. 3 (fluent; clear, concise writing) and No. 8 (immunity to 
slang jargon, and colloquialisms) appear to be concerned primarily 


with diction. 


The substance of these excursions into the realm of writing 
quality is that there is rather considerable agreement concerning 
the positive attributes of literate writing and concerning those 
attributes most relevant to postsecondary success. Further, 
Diederich's original classification, if not necessarily his order 
of priority, is validated elsewhere, and in our case with one 


important modification. 


The modification is this: that as the mode _ assigned 
specifically required the development of an argument or point of 
View, OUdederpehts @hirsl, Sonrvortiy (ee tideas’,” should be 


characterized more specifically as being concerned with point of 
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view expressed, and the effectiveness of the logic and evidence 


employed to substantiate it. 


The sample of 100 essays therefore was examined from the 
following standpoints, making flexible use of a Diederich-type 
analytic scale (Diederich, op. cit., p. 54) with reference: to -the 
positive chanacterizations of writing as described. The 
Classifications concerning conventions and mechanics were omitted, 


as they had been separately examined in the error count. 


The following criteria (elaborated, with variations, from 


Diederich, pp. 55-57) were employed: 


ORGANIZATION (from Diederich, unmodified) 


High: The paper starts at a good point, has a sense of 
movement, gets somewhere, and then stops. The paper 
has an underlying plan that the reader can follow; he 
is never in doubt as to where he is or where he is 
going. Sometimes there is a little twist near the end 
that makes the paper come out in a way that the reader 
does not expect, but it seems quite “logical. Main 
points are treated at greatest length or with greatest 


emphasis, others in proportion to their importance. 


Middle<...Lhe: tonganization! of Athas paper is standard and 
conventional. There is usually a one-paragraph 
introduction, three main points each treated in one 
paragraph, and a conclusion that often seems tacked on 
or forced. Some trivial points are treated in greater 
detail than important points, and there is usually some 
dead wood that might better be cut out. 
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Low: 


This paper starts anywhere and never gets anywhere. The 
main points are not clearly separated from one another, 
and they come in a random order--as though the student 
had not given any thought to what he intended to say 
before he started to write. The paper seems to start 
in one direction, then another, then another, until the 


reader is lost. 


2. EFFECTIVENESS OF ARGUMENT OR VIEWPOINT 


High : 


Middle: 


Low: 


The writer provides a balanced argument or viewpoint, 
treating the subject with an appropriate detachment and 
some originality, and the argument develops to an 
appropriate conclusion based on the evidence. Judicious 
use is made of examples and other forms of evidence, 
and generalizations are supported. The argument of 
viewpoint develops’ logically. The writer’ reveals 


commitment to this position. 


The writer tends to rely too much on unsubstantiated 
generalizations and general clichés of thought. The 
viewpoint expressed may be rather one-sided without 
adequate regard for alternative positions. Transitions 
between stages of the argument may not always be 
clear. The student may evince little commitment to 
what he is saying and betray a low interest in the 


subject. 


Generalizations and thought clichés abound with little 
regard for substantive evidence. The argument may be 
heavily biased or, perhaps worse, the writer may 
attempt to present all sides without ever taking a firm 
position and the essay rambles to no conclusion in 
particular. There is little awareness of logical 


development and the arguments themselves are tired. 
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3. STYLE (SENTENCE STRUCTURE) 


High: 


Middle: 


Low: 


Sentences are well varied in relation to purpose. Less 
important ideas are effectively subordinated and there 
is clear evidence that such devices as parallelism are 
used to good effect. Sentences are varied in length and 
made economical in expression through ellipsis and 


tautness of phrase. 


Though some variety in sentence structure is evident , 
the variety is not particularly well related to effect. 
The writer appears unaware of the effective use of 
stylistic and rhetorical features of language. There is 


limpness of style with occasional minor errors. 


No awareness of style or variety in sentence structure 
is evident, and errors, major and minor, do occur. 
Sentences tend to ramble to no purpose and expression 


is woolly or redundant . 


4. STYLE (DICTION) (from Diederich, unmodified) 


High : 


Middle: 


The writer uses a sprinkling of uncommon words in an 
uncommon setting. He shows an interest in words and in 
putting them together in slightly unusual ways. Some of 
his experiments with words may not quite come off, but 
this is such a promising trait in a young writer that a 
few mistakes may be forgiven. For the most part, he 
uses words correctly, but the - also uses them with 


imagination. 


The’ writer “is addicted to tired old phrases and 
hackneyed expressions. If you left a blank in one of 
his sentences, almost anyone could guess what word he 
would use at that point. He does not stop to think how 
to say something; he just says it in the same way as 


everyone else. A writer may also get a middle rating 
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Low: 


on this quality if he overdoes his experiments with 
uncommon words; if he always uses a big word when a 


little word would serve his purpose better. 


The writer uses words so carelessly and inexactly that 
he gets far too many wrong. These are not intentional 
experiments with words in which failure may be 
forgiven; they represent groping for words and using 
them without regard to their fitness. A paper written 
in a childish vocabulary may also get a low rating on 


this quality, even if no word is clearly wrong. 


5. FLAVOUR (from Diederich, unmodified) 


High : 


Middle: 


Low: 


The writing sounds like a person, not a committee. The 
writer seems quite sincere and candid, and he writes 
about something he knows, often from personal 
experience. You could not mistake this writing for the 
writing of anyone else. Although the writer may assume 
different roles in different papers, he does not put on 
airs. He is brave enough to reveal himself just as he 


is. 


The writer usually tries to appear better or wiser than 
he really is. He tends to write lofty sentiments and 
broad generalities. He does not put in the little 
homely details that show that he knows what he is 
talking about. His writing tries to sound impressive. 
Sometimes it is impersonal and correct but colourless, 


without personal feeling or imagination. 


The writer reveals himself well enough but without 
meaning to. His thoughts and feelings are those of an 
uneducated person who does not realize how bad they 
sound. His way of expressing himself differs from 
standard English, but it is not his personal style; it 


is the way uneducated people talk in his neighborhood. 


67 


Sometimes the unconscious revelation is so touching 
that we are tempted to rate it high on flavour, but it 


deserves a high rating only if the effect is intended. 


No generalizations based upon an examination of approximately 
100 of some 1600 essays can be pressed too far. There are, even 
within the sample studied, exceptions to most of our generalizations. 
Nevertheless, some interesting patterns do emerge, ‘sufficient, we 
hope, to lead to some useful reflection on the character of student 


writing in Grades Twelve and Thirteen. 


Just as 100 essays drawn from across the range cannot be said 
wholly to typify writing at different points of the ranges S03 {too 
it must be remembered that these essays represent work of students 
ranging quite widely in their present level of schooling and their 
plans for the future. In addition to Grade Thirteen students (many 
anticipating university), Grade Twelve students at both general and 
advanced levels, preparing for work, for further education at CAATs, 
or for another year of secondary. school before university--all 
contributed to the essay sample. The exemplar essays have been 
selected, therefore, to illustrate some of the characteristics of 
writing and their presence (in degree) or absence in the sample 
across the range, not as good or bad examples of writing nor as a 


basis for saying "That's how (all) senior secondary students write." 


After a brisk reading of the essays, a selection was made of 
those that fell at particular points along the scale. These essays 
were carefully examined and scored according to the criteria 
described above, using a l - 5 scale (1 = low; 5 - high) for each 
criterion. The first purpose was to determine whether and at what 
points along the scale essays began to meet individual criteria 
adequately. Secondly, we wished to see whether some criteria were 
met by essays scored low as well as high, with little pattern. When 
scores on the Diederich criteria were compared with essay marks, we 
found, as expected, that scores on specific criteria tended to rise 
as the scores on the holistic scale rose; however a number of 


irregularities in the pattern made it clear that strength in certain 
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characteristics, notably quality of argument and flavour, were by no 
means the exclusive preserve of those writers scoring near the top of 
the scale. As well, the rate of improvement as related to the 
holistic scale was different for different criteria, as noted in the 


following comments. 


(a) Organization: Essays at the bottom of the scale generally 
exhibited very poor organization, often with poor and infrequent 
paragraph division. Conclusions, if reached, tended to be very limp 


and sometimes essays trailed off altogether. 


However, in all the higher parts of the range examined, 
essays, whatever other weaknesses they had, did exhibit adequate to 
good control of paragraphing and evidence of reasonable planning. 
Though there was a failure in some to give due weight to the more 
important aspects of the argument, they were reasonably coordinated 


and developed towards a conclusion. 


(b) Argument: Quality and force of argument with appropriate 
illustrations and balanced treatment proved to be much more unevenly 
distributed. A few of the otherwise weakest essays did show some 
strength in this regard, though most of the essays that with some 
consistency could be praised for quality of argument were found in 


the top portion of the score range. 


The general weaknesses exhibited included reliance on 
superficial generalizations and lack of adequate evidence in support 
of statements. Some writers failed to discriminate between less and 
more important aspects of the cases they were making and tended to 


introduce redundancies or irrelevancies. 


(c) Style--Sentence Structure: Serious errors in_ sentence 
structure were infrequent right across the range (see preceding 


"error count" report); on the other hand, attention to sentence style 
appeared to be generally lacking in essays scored below 6.0. Though 
there was some variety in sentence length, order and complexity, Le 


appeared rather random, without consciousness of effect. 
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Use of parallelism and antithesis, which might be expected 
frequently in the presentation of an argument or viewpoint, appeared 
very rarely except in the essays rated near the top. In these last 
essays, there was evidence of tautness and control; in the less 
distinguished essays, sentences tended to ramble and insufficient use 


was made of means of subordinating less important ideas. 


(d) Style--Diction: Diederich's description of the dull average, 
"The writer is addicted to tired old phrases and hackneyed 
expressions," fairly characterizes the writing in a high proportion 
of the essays examined. The search for the exact word, with a few 
notable exceptions, simply was not evident. It may be speculated 
that students were insufficiently motivated to put forward their best 
efforts; nevertheless, the lack of colour and vigour in diction and 
imagery is discouraging. The most popular topics (#1 on Violence and 
Censorship and #4 on Course Options--Secondary) particularly appeared 
to encourage dullness. Pretentiousness of diction was rare (and 
almost welcome); those students who wanted to show off did so in 


other ways. 


(e) Flavour: The same topics (#1 and #4), though sometimes 


argued strongly, had generally a staleness about them. 


"Flavour'' was not entirely concentrated in the superior essays. 
Some middle-range essays, undistinguished in other respects, came 
alive through the evidence of personal commitment on the part of the 
writer, while some of the essays in the top of the range, and 
correct in most respects, lacked originality and freshness. While 
flavour was more commonly found in the best-rated essays than in 
those with lower ratings, no general statement about the distribution 


of flavour across the range is possible. 


The absence of flavour in so many essays may be a reflection of 
the testing situation, in which motivation was low and students could 
expect no writer-audience interaction. It would be unwise, 
therefore, to conclude that most senior secondary students are 
without the ability to produce flavourful prose. In fact, the most 


interesting and enjoyable examples of writing sometimes came from 
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students who refused to take the test seriously and so entered into a. 
sort of devil-may-care dialogue with the anonymous examiner. The 
freshness and breeziness of style, while perhaps a little too much of 
a good thing (as in a following example), produced an appealing 
personal ingredient otherwise generally lacking. That this occurred 
with some frequency underlines the importance of developing "real" 


writing situations in order to measure more effectively student 


writing ability. 
EXEMPLAR ESSAYS 


Lhese.,, , as, noted.carlier),:;are, not, »meant,.to typify athe “student 
writing but to illustrate some of the writing qualities referred to 
in the criteria so that the trends can be better understood. Comments 
on errors in spelling, grammar, etc. have been excluded, as error 


counts have been treated separately. 


FIRST EXAMPLE 
(SGU R bei ) 


TOPIC: Secondary schools should not demand that students 
take particular courses or pre-requisites; students 


should be free to choose courses of interest to 
them. 


The debate of whether students should have a 
choise by cources they can take in high school, can have 
two sides. On the pro side, there would be most 
students who are about to entre or have just entered the 
secondary level of education. The reason for this is 
quit simple, people want as much freedom as they can 
get. I know, several years ago as a student in grade 
eight, «they ideay,of .freedomeofinchoise.was ,quite 
appealing. But today, I can see how easily it could of 
been for me to make a serious error in course choises. 
This occured to me only after it was to late to make 
any alterations. Forcunatly, though, I did make the | 


right choices, and therefore, I can continue my 
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education as I wish. But other people I know have not 
been so fortunate. They made the wrong choices when they 
first entred high school, and today they cannot follow 
the course of education they wish. The main reason for 
this occurance is that students are made to make this 
desision when they are quite young and do not fully 


realize the importance of the desision. 


On the con side of the arguement are the older 
people or people who have finished thier education. 
These people on the whole would like to see the cources 
pre-set and compulsory to all students. In this way, 
they make the assumption that this will result in better 
educated students to enter university. This is a certain 
degree can be a valide point. But there is one falicy to 
the arguement, students on the whole will do better in 
subjects they enjoy taking than in subjects that do not 
perticularly appeal to them. I know personaly that 
subjects 1 enjoy taking Peraee as much as_ twenty 
percent higher than subjects I don't perticularly enjoy. 
This I emagine can be said for all students. The result 
therefore is that if you make students take many 
subjects that they do not enjoy, then they will not work 
as hard at them and thier marks will generaly be lower 


than if they had choosen the course themselves 


At this point, it may appear that I am sitting on 
a fence and not making any altenative methods. But I 
feel that a compromise can be made between the pro and 
the cons. I feel that some cources should be compusory. 
These subjects are English, mathematics, History and 
science. English is an important pre-requisite since 
almost all university require english before they will 
accept graduates. In addition, english is a subject that 
makes the backbone of a persons intelegence. If a person 
has many good ideas and thoughts, he can be considered 


intelligent, but if he cant express his ideas in an 
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intelecuale fasion, than he may not appear to be an 


intelecual. 


Mathematics is an important subject becaus for 
graduates to get into university under various cources in 
sciencses and engenering and medicine, he may need 


mathematic cources from secondary schools. 


History is a necessary cource for students to 
take. Of all the cources you take at high school, 
history has one of the most practical uses in every day 
life. If a person ever gets involved in a discusion, the 
knolage of historical events may be very useful in 


defending a point. 


Science is also an important course that should be 
taken, at least in the level one grade, because it gives 
the student a better understanding of the world around 


him. 


I feel that all the cources I have stated above 
should be taute at the advanced phase level until the 
end of the level two year. In this way students are 
much more aware of their own goals they want to take, 
whether it being university or colages of applied arts 
and technology. After the level two year, I think that a 
greater freedom of choice of cources should be given 


with english and mathamatics being compulsory. 

Therfore, if a compromise between the two sides is 
drawn with the outline above, the level of education 
will be bettered without the expen(illegible) of the 
freedom of the student being taken away. 


COMMENTS ON FIRST EXAMPLE 


This essay, though replete with mechanical errors, illustrates 


the point that essays relatively low on the scale did show some 


Ths: 


effectiveness in organization and paragraphing. It follows a 
“Here's one side; now here's the other" pattern, followed by 
development of the writer's own position in some detail and with 
reference to several subjects the student considers important. And 
even if the conclusion is limp, it does provide a summary 
statement as the student steps back from particulars to a final 


generalization. 


No doubt paragraphing could be more effective; a better 
design might have been to deal with specific subject areas in 
fewer paragraphs, but there is at least a sense of "one idea--one 


paragraph". 


The tendency to weak generalizations is illustrated by this 
essay: "On the con side of the argument are the older 
people...(who) would like to see the courses pre-set and 
compulsory to all students," as if there were clearly two camps 
of the human race, entering Grade 9-ers and "the older people", 
the one group holding one position universally, and the other, the 
other. What the student refers to as the fallacy of the latter 
position (that "I", and likely most other students, do well in 
courses I like) ignores the possibility that students tend to 
"like" the courses they do well in; i.e. have the capacity for. 
Un the positive side, the student does attempt to be specific in 
reference to his own experiences and in noting specific subjects. 
A reasonable sequence of reasoning can be followed and there is 


balance in the argument. 


In style, both sentence structure and diction, the essay is 
wholly undistinguished, characterized by a lack of terseness and 
little consciousness of word values. For example, the writer adds 
an explanatory sentence, "The main reason for this occurence is 
that students are made to make this decision when they are quite 


VOun Geter. rather than attaching the explanation to the previous 


sentence with a linking phrase such as "chiefly because". 
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The essay has some flavour: one senses a commitment to the 
position taken and evidence of deliberateness in its development. 
The personal note is frequent and desirable; without it the essay 


would be extremely dull. 


SECOND EXAMPLE 
(SCORE 5.5%) 


TOPIC: Should scenes of violence in books, in films, and 


on television be censored? 


Lt. sisihardixto ‘say whether cencorship should exist 
at all or to what degree. If everyone was of sound-mind 
and basically a good character we wouldn't have to worry 
about it. Evidently however even all us Canadians don't 
qualify for the sound-mind bit. Unfortunatly also not 
all the ones deserving a rubber room have one either so 
were left with a lot of good people a few harmless 


crazy ones and a few who aren't so harmless. 


I've read a lot of books and enjoyed most-of them. 
I'ms.just trying, to think of them without. their crime 
festered plots. They'd probably be really dull and boring 
for most people. Maybe it would prohibit one idea of the 
perfect crime but Canadian illeteracy % would likely 


skyrocket. 


Last year, several schools throughout Canada and 
the United States were plagued with shooting incidences. 
A mentally disturbed student would enter a school and 
open-fire in the halls and classrooms. Many people were 


killed and everyone seriously frightened. 
Undoubtedly, the tremendous amount of coverage on 


the topic and the general outroar from society against 


this particular crime somewhat encouraged these 


Te 


students. It really makes you wonder whether were not 
all to blame. At the beginning of each new television 
season we witness more and more crime. Our need for 
gory deaths and drastic amounts of blood never seems to 
be fulfilled. Sometimes changing the channel doesn't help 
much either. Our taste for good "shows" is becomming 


warped and contorted--we like crime!-- 


However, I'll admit we're humane. Sure enough 
after the above mentionned killings new laws prohibiting 
certain guns and rifles were introduced. How well they 
are enforced and by whom is up to us. We really can't 
expect too much if we think well you know--my son can 


have a rifle--he's got it all upstairs! 


Canadian crime statistics have increased over the 
past few years along with the population. Provincially 
Ontario came out number one again and not by a small 


persentage. 


If. Syourtwere -tataskwa Mratiermwhatcrame™ ire 
considered the worse, he'd likely tell you, rape. 
thinking back tell September, I can recall hearing about 
or seeing at least twenty shows involving rapes. Is it 
so odd then that the number of rapes is also ascending? 
Whether it be good old-fashioned Dr. Welby, Medical 
Centre, Policewoman or movies, we cannot escape the 
present trend. We'd all likely vote rape in our top five 
most dreaded crimes for women and we'd disown a raper 


son. 


It use to be, a women had to prove her innocence 
when she pressed rape charges. Her history was gone over 
and if possible she was made to look like a real tramp. 
The raper was presented to be the all-american boy. 
Recently however, a new bill was passed whereby a 
woman's history would not be revealed when dealing with 


this sort of crime. Instead the pendulum swings the 
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other way and the raper must try to prove his innocence. 
More of this should be shown on television. It might 
serve to discourage would-be rapers if they thought 


their chances of getting caught were a lot higher. 


I'm just not content to say well were not as bad 
as the States! 


Presently Judy LaMarsh is heading a committee in 
Charger “or “censoring: crime Jon eT ave” le think@ is” a 
worthwhile cause if done properly. Right now, I find 
Canadian T.V. censorship too lax. For example 
"Deliverance" was on 1T.V. not too long ago, the next 
day I saw one little boy grab the ear of another kid and 
told him to squeal like a pig! 


Everyone knows that even little kids don't go to 
bDed= right alter dinner "so =shortily atteror evs. Tandly hour 
doesn't really make a difference in the age of the 


audience. 


I find that many movies shown at theatres which 
are restricted have less to offer in the way of«crime as 


do many T.V. shows. 


I agree censorship should exist but to certain 


degrees. 
COMMENTS ON SECOND EXAMPLE 


There is an organizational thread running through this essay, 
though the writer's view of the topic does not become especially 
clear until late in the essay, because he (she) jumps out early 
into a number of sociological dimensions, presumably effects of 
excessive TV violence, before coming back to state the argument 
or case. There is, then, a plan, though one that could be made 


more effective. 
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Here, however, we do see improvement in the effectiveness 
of argument, though the essay remains somewhat disjointed and 
some of the assumed causal connections are rather far-fetched 
(just as they often are in magazine or newspaper articles on the 
censorship of violence). He is reasonably concrete in _ his 
illustrations: note especially, paragraph ten. And among his 
reflections emerges the possibility that it is not our television 
that is to blame, but ourselves--the society and what it values. 
There are insights that prevent him from making blanket 


generalizations or suggesting simplistic solutions. 


There appears to be a genuine attempt to vary sentences for 
effect, perhaps an over-reliance on exclamation marks. Diction is 
uneven in quality; sometimes words are chosen with precision, at 
other times the choice is overly colloquial and there are actual 


errors. (Or is "“outroar" an interesting coinage?) 


Though there are many matters to fault in the essay, the 
writer has compelled the reader's interest with some touches of 
humour (even if rather trite) and with his directness. Through the 
writing we catch something of the writer as a person, a feature 
rather infrequent across the sample of essays examined. The essay 
does; gmuindicale,  thak, wi Peliavoures nts Obese sGuality Of. owt ine 


ex chusive stontnesbestoccsays. 


THIRD EXAMPLE 
(SCORE 6.0) 


LOPLC s What, are»-the walues of,.the, Olympic. .Games, not 
onl yrmifonathes jcompebators,..but ,especially sfor the 


countriess theyerepresent. and. for the: world? 


[nie this: sessa ye leswwilie bes discussing the values «in 
them UlynpiensGamessie Notetonky formpthe »compebitors.... but 
especially for the,icountries they represent. sand» for, the 


world. 
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The year 1976 is an important one for Canada. 
This year, in Montreal, Canada has the honour of holding 
the summer Olympics which will commence in July. The 
Olympic Games have had excellent results in their money 
raising campaigns through the Olympic lottery. This is 
Onewsouree, af snonem. for thesconst auction latethie 
buildings many will see this summer. The second source 
of money, unfortunately comes from the Montreal 


tax-payer's pocket. 


The Olympic's;,valueyin,,a country, isthe, sunien it 
brings. Every year either in summer or winter the games 
that are held in a chosen country unite all countries of 
the world. Even if there are four athletes present to 
participate in games there is no limit to the amount of 


persons that should represent their country. 


The Olympics this year in Canada will bring the 
awareness of how important sports are for Canadians. 
Millions of Canadians are not fit, hopefully the Games 
will motivate them to take up one of the numerous sports 
of fered. 


The Olympic Games also bring in tourists from all 
the corners of the world. This not only provides wealth 
for the country, particularly the city hosting the games 
but also provides an image of beauty for a country. Once 
the visitors came to a country that shows warmth and 
friendship as seen in Canada there will be a want for 


return or a recommendation to their friends. 


For those Canadians who do not attend the Olympic 
Games. aamtheviuywalle ibe gable taresee ithe, sbuildingsy an 
Montreal whenever they wish. The buildings in Montreal: 
the veladrome, the centre arena, and the Ulympic Village 
will remain there, in memory of the honour, beauty and 


union that the Olympics once offered in Canada. 
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The Olympics for Canada brings a union within the 
country. In recent years there has been a major problem 
of bilingualism. This is heard in the province of 
Quebec. They feel Quebec's major language should be 
French which has faded away in the last couple of years. 
The Games will provide a unity for the east and west 


side of Canada. 


The value the competitors take in the Olympic 
Games is the most exciting experience in their life. 
The feeling of representing their country in seen all 
over their face. Even if the athletes do not gain a 
medal they are not sore losers but feel they gave it a 
good try and besides everything else it was worthwhile. 
The athletes of each country get to communicate and 
learn about other cultures of their competitor. There is 


friendship between two countries shown by the athletes. 


The athletes show the importance of a particular 
sport in their country. This is shown by their training, 
skills, and a medal usually. For example: In West 
Germany, sports play a major role. After work every 
individual has a particular activity they take up. This 
shows *"the*’ importance Vof® fitness’ im ethat country. 
Athletes work very hard for long hours day after day. 
They eat certain foods, and not too much entertainment 
is allowed on weekdays. Almost everything is under 
strict control. These results lead to the other country's 
awareness of a higher aim for the next year. Either 
winners or losers all competitors rate the same to each 
other. They know due to experience the hardship one must 


go through to enter the Olympic Games. 
Therefore we may conclude that it is through the 


competitors and the country as a whole that bring out 


the values of the Olympic Games. 
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COMMENTS ON THIRD EXAMPLE 


This essay, though reasonably adequate in organization, 
illustrates the general blandness of the writing in the middle of 
the range. Viewed from the standpoint of errors, mechanical and 


other, this essay appears to be more effective than it really is. 


The nafiveté of the argument and lack of substantive support 
are obvious at once; views of the Olympics as a cure ffor 
bilingual problems, an expression of world friendship authored by 
the host country, and an event with real impact on sports and 
fitness all seem rather futile (though the reader must remember 
that the essay was written before the Games). The writer has 
veovered! 4, they topic’, but thinly, and with little sense of 
direction or purpose. The concluding sentence is "limp"--just a 


sign-off. 


There is little awareness of sentence style (note the 
choppiness of the many sentences in the ninth paragraph), and 


diction and imagery remain wholly abstract. 


The blandness of effect is explained and reinforced by the 
absence of flavour; there is no sense of personal “commitment or 
interest. The one touch of humour (last sentence of paragraph two) 


stands out only because nothing else in the essay does. 


While many commentators on student writing at the interface 
have been greatly concerned about error frequency, the frequency 
of such apathetic writing perhaps should be a matter of even 
greater concern. Doubtless the test circumstances were a 
Ssignifireanky | contrabutings) factor .sin sithat, they, did. little to 
stimulate writer commitment; nevertheless, the dryness of so many 
essays suggests that more might be done in schools to encourage 
commitment and vigour in writing. The student with a stake in 
his ideas and with a richer sense of audience is the one who is 


likely to write purposefully and effectively. 
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FOURTH EXAMPLE 
(SCORERGS)) 


TOPIC: Should scenes of violence in books, in films, and 


on television be censored? 


Have you read Stranger in a Strange Land by Robert 
A. Heinlein. If ‘net, get up ‘off your ass and go buy 4a 
copy. On second thought stay there and I'll tell you the 
pertinent point. (How's that for alliteration?) Heinlein 
says that man is the only creature that laughs and that 
mankind laughs "because it hurts". This a very emotional 
world with an abundance of pain, both physical and 
mental). Is’ *pain’-an emotion? “A feeling?) No," it is oan; 
intangible but ever present. Pain is a guillotine over 
your head that is ready to drop at any instant. Pain 
hovers above you at all times and each one of us becomes 
changed in “some way die “to” "our efforts, to avoid its 


presence. 


Does violence warp a child's mind? Does he become 
"strange" because’“of™ his ‘efforts’ to’ avord or perhaps 
even seek the pain he has been exposed to? I was weaned 
on violence and look how well I turned out. Or did I? 
Now turn around and look at the occupants of the world's 
prisons and asylums. And graves. Most of those people 
weren't born that way: They were changed, warped by 
society. Somewhere along the line they were twisted by 


their environment. 


It is time to try a little preventative medicine. 
We can never abolish violence but we can try control the 
market we sell it to. Go ahead and censor the violence 
on television during the hours that children watch. 
Censor that idiot box while you're at it. But remember 
that if you initiate your censorship program you will 
put a lot of industries out of business. Certainly the 


cop shown would survive: there will always be an adult 
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market for such trash. But what about the cartoons and 
comic books. Even Walt Disney films have violent 
scenes. Those comic books are really incredible: "Triple 
A rating--No sex, mo drugs, no foul language", just 
picture of three mice going through a meat grinder and a 
cat swallowing a bundle of dynamite. When was the last 
time you watched Bugs Bunny? Did you see him shoot 
Elmer in the face with a shotgun? Funny eh? Have ever 
laughed at a friend who just fell on some slippery ice? 
Dent “deny” it, you. thought’ it was ‘hil@arious...eyvou 
laughed because it hurts. The Three Stooges built a 
comedy empire on pain and violence. Think of the last 
good joke you heard. Chances are that if the joke wasn't 
dirty it's meaning was actually rather sad. If you 
eliminate violence you will eliminate a great deal of 


laugher too. 


I guess you realize that I haven't taken a firm 
stand on this issue. I believe that we would solve a lot 
of emotional problems, clear a lot of mental wards, if 
we were able to stop teaching violence to impressionable 
youngsters. But I can't be sure that those children 
would be able to cope with the violence they are sure to 
meetin Pthelyungle Selim notecure=that, 1 vcoulldirstand 


to see people living on half-rations of laughter either. 


I'm laughing now because it sure as hell hurts. 


COMMENTS ON FOURTH EXAMPLE 


This essay is included to illustrate the observation made in the 
introductory comments that some of the most interesting and 
flavourful writing was exhibited where the student refused to 
take the test too seriously. The writer-reader relationship 
changes dramatically from a dreary “writer to examiner" style, 
illustrated in the previous essay, to more of a "writer to his 


peers" style. 
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Probably because this student has broken out of the mold, he 
provoked a wide variety of responses from the scorers who rated 
the essay all the. way from 3 to 10 on the scale. Apart from the 
mechanical faults, evidently some scorers reacted strongly and 
negatively to abrupt shifts into a language register generally 
colloquial and of questionable appropriateness. Other’ scorers 
likely over-reacted in the opposite direction, simply out of 
relief at the gasp of fresh air this writer provided. No once can 
say.) the.essay ..is) weak, on ‘flavour, ievensuf. the jfilavour sis) too 


strong for many. 


Leaving aside "register‘', there is an unevenness in the 
writing and argument. The quasi-philosophical reflection on pain 
seems, pabntly, irrelevant. sAnd sfilippancy .is sfnequentily 
indistinguishable from mere carelessness. There is a logic to the 
essay but the personal intrusions and illustrations, even if 
germane, are introduced in a way that partly obscures’ the 


direction of thought. 


On the other hand, many of the jabs are good ones. Are not 
the observations on contents of comic books sharp and fair? And 
the writer is not the first observer of the human scene to note 
the interplay of pain and laughter. On a second reading, it is 
clear that this student operates well in mingling the concrete and 


specific with the universal. 


Diction, barring the "bottom of the vernacular" descents, 
is generally precise if not outstanding, and certainly the writer 


uses sentence style to create effects. 


A quick and vigorous mind is at work here, revealing itself 
in somewhat heterodox ways. And even if these are perceived as 
somewhat extreme, the essay suggests that if a more effective 
stimulus were provided, one that placed writer and intended 
reader in a more intimate relationship, rather different and much 


more interesting qualities of writing might emerge. 
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FIFTH EXAMPLE 
(SCORE 8.0) 


TOPIC: Does art imitate life, or does life imitate art? 


A piece of art that imitates life is not a true 
work of art. A person that lives his life imitating art 
is not truly alive. Art and life are not carbon copies 


of each other, but rather reactions to experiances. 


When a author sets out to write a book he doesn't 
regard events. That is the reporters job. Instead, the 
author writes about feactions to the events, from inside 
him rather than from outside. The artist who paints a 
picture does not try to imitate the scene. A camera 
will ‘serve’ that purpose. The artist’ will attempt to 
record his reaction to the scene presented. The ballet 
dancer can perform the steps perfectly but unless his or 
her own interpretation of the steps is added, the 
performance is dead, simply an imitation of life. We 
have examples of) the limitations. Every ‘corner book 
store has that one shelf of books Mother forbade you to 
look at. These are imitators and can hardly be called 
fart’, The difference. between true art and an imitation 
of life is the difference between DaVinci and Playboy; 
between Robert Wagner and Alice Cooper; or between D. 
H. Lawrence and the man who sits in his attic apartment 


spewing out 5 book a week. 


brie, Slikemartts 1s ?aeseries: sof ssponteneas 
reactions. By attempting to imitate anything, especially 
art, we inhibit these reactions and become only half a 
person, the biological part. What makes us human is not 
only our physical selfs but also our spiritual beings. 
Just as our body would reject an imitation, any attempt 
to supply imitations to the spirit would also be 
rejected. Living a life that attempts to imitate art 


would be like gazing at a photograph. Oh, the pictures 
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are very “nice™ but “everyone 1S “a “bil “Commsulpfe rand 


unnatural, the smiles, a trifle to wide. 


As a carbon copy is fainter than the original, an 
attempt to imitate life as art would never be true. 
Life and art are similar as they are both reactions but 
reactions can never be recorded and replayed. Here 
today, gone tommorow is what makes life and art what 


they are. 
COMMENTS ON FIFTH EXAMPLE 


This topic, mot a (‘very popular one, produced some’ very 
interesting essays. Those most interesting were the ones where 
the students attempted, as in the present example, to carry 
mental explorations somewhat further than maturity and knowledge 
warranted: truly an "essai" or a "try''. Generally scorers treated 


even relatively unsuccessful "tries" with generosity. 


This essay illustrates well the quality of essays placed 
near the top of the score scale. Though the essay is short it is 
characterized by tautness of language and organization. Statements 
of position, even if somewhat erroneous in fact, are put forward 
bluntly and unambiguously. This directness and absence of padding 
gives a freshness of flavour so frequently lacking in other essays 
and the choice of language is generally appropriate to the subject 


matter. 


Argument is well developed by use of contrast (life: art; 


artist: reporter; body: spirit) and use of illustration. 


That the student is reaching beyond himself is apparent from 
his conclusion; he is out of his depth and does not know quite 


how to reach shore. The reader must, however, admire the swim. 
In this essay, we see a more mature effort to relate 


sentence structure to purpose. If the writer had learned the 


semi-colon, he might have used ellipsis and antithesis to better 
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effect. However there is a good rhythmic sense and usually 


variety in structure relates to effect intended. 


SIXTH EXAMPLE 
(SCORE 9.2) 


TOPIC: A competitive environment develops strength of 


character. 


Today's society is to a great extent a competitive 
society. Early in life we are taught the importance of 
"getting ahead" and "striving to be better than the 
neighbour."" Throughout our school system, "success" is 
measured directly according to one's ability to achieve 
"more" than one's classmate. When looking for a job, 
one is forced to compete in order to make an impressive 
impression. Indeed, in a capitalistic society such as 
our own, the "competitive environment" is a predominent 
factor. Some people see no harm in this: philosopher 
Thomas Hobbes insisted it was "in man's nature" to 


compete. 


Is this so? Or does society force man to be 
competitive? “And “if “so, “dees this’ competitive 


environment develop strength of character? 


Because of the continuous indoctrination - of 
competitiveness throughout one's early life, it is 
difficult to lose the competitive nature. As a result, 
men are forever "looking out for themselves"; trying to 
DELCO Chemselvess “tryindgy ve "gett “ahead.” “ina 
capitalist system,. the ultimate result» isj,ithe 
ex pLoOLeCatonecof wan Dy) man, Of vekass -by*"class. 
Individuals strive for personal, material gain. Instead 
of developing strength of character and humane values, 


men become obsessed with making money. Because our 
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society is competitive, and because it places a high 
value on material wealth, it naturally follows that men 
strive to further personal wealth. The “profit motive" 


overrides the importance of humane values. 


As a result, those who are unable to "compete" 
live often in worlds of mediocrity or even poverty, 
while those who are_ successful competitors live in 
affluence and are looked up to by society. These 
"successful competitors" feel little or no obligation to 
enhance the living conditions of their fellow men. The 
competitive environment in which they have been brought 
up has conditioned them into believing that they owe 
HONING. to ptheir sw compet itor cee ween Degen Sy eG 
consideration for the needs of those who are. less 


fortunate. 


Instead of building a strong character--one which 
has consideration for others--the competitive environment 
conditions man to be narrow-minded and concerned only 
with self gain. Moreover, the competitive environment 
creates hardships for those who are unable to compete. 
Because of our society's overemphasis of the competitive 
environment, many children go hungry and many elderly 
citizens are unable to live with dignity. Hopefully, our 
society will recognize this fact soon, and transform it 
from a competitive society to a society concerned with 


brotherhood. 
COMMENT SmOINS Sn Xo) ie OAM Pe 


This essay represents the top of the scale as available to us in 


the sampled essays... It requires very little comment. 
Like the previous example, it is tautly constructed; there 


are no waste words or irrelevent ideas. Plan and direction are 


always clear. 


88 


In argument the student articulates his position clearly; 
however, as in most of the essays, supportive evidence or 
illustration is not available. In this essay, the logic of the 
case is there and the care for development has the reader nodding 
rapidly in agreement rather than looking for evidence. He would 
have to look hard, for the essay is built on generalizations about 
"people", "man" or "men", "some people", and "our society". The 
student provides a definition of "a strong character" which is 
rather noble, but it is not a definition that would win universal 


agreement and it is not supported. 


Flavour is good simply because the student feels strongly 
and expresses himself effectively. The reader, whether he agrees 


with the position or not, knows whom he is listening to. 


Sentences are varied effectively in relation to purpose. The 
use of a series of questions in a short paragraph is a very 
effective means of shifting from description to argument. The 
periodic sentence with the "lead-in" by means of subordination, 
if somewhat over-worked, is generally effective, especially in 


development of the argument sequence. 


The level of diction, if not exceptional, is generally 
stlited *to™ the® topie, “but 'marred “by the +student*s “tendency to 


italicize (by quotation marks) too often. 


As in many of the essays, the conclusion is limp. In this 
instance, as in the previous example, the exaltedness of theme 


leaves the student floundering for an adequate conclusion. 

D. Postsecondary Appraisal of the Essays 

The fifty essays scored by all 36 scorers and for which, as a 
consequence, we have highly reliable scores were examined by 


instructors experienced with first year students at four Colleges 


of Applied Arts and Technology and four universities. 
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The fifty essays were randomly drawn in approximately equal 
numbers (26 Grade Thirteen, 24 Grade Twelve) from both grades 
involved in the Interface survey, and included the essays of Grade 
Twelve students either going on to Grade Thirteen, dropping out, 
or going on to postsecondary education, and of Grade Thirteen 


students going on to postsecondary education or dropping out. 


The essays were arranged by student code number rather than 
rank order and instructors were asked simply to read the essays 


and classify each as 


(A) Acceptable quality of writing for entry to program at 


this institution. 


(R) Remedial attention appears to be required, but some 


qualities of the writing appear promising for success. 


(X) Quality of writing not acceptable, even if some 
remedial attention were provided, for likely success in 


programs at this institution. 
A copy of their instructions is included in Appendix A1C. 


Our contact person in the English Department at each of the 
CAATS and universities selected was asked to find one person 
experienced in teaching first year students in each of (i) the 
English Department, (ii) a humanities or social science area, 
(iii) a technical, business, or natural science area (one CAAT 
contact person found two judges from the English Department and 
only one from another department). The contact person made the 
selection and distributed the essays and instructions, providing 
any further explanations that seemed necessary. The resulting 


breakdown of teachers was as Follows: 
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University CAAT 


English Department 4 5 
Other Department 8 7 


forma tovaly of 247 teachers. 


We were interested in determining the percentage of essays 
written by students in the different SSGD and SSHGD groups (see 
Chapter Three for a definition of the various groups) that would 
be placed in the acceptable, remediable and unacceptable 
categories by different types of instructors. As well, we were 
interested in whether there was any difference between English 
Department members' judgements and the judgements of instructors 
in other departments. The point here is that a great many 
students do not elect to major in English at postsecondary 
institutions: "literacy" acceptable for one department may not be 


the same, or nearly the same, for all. 


A word of caution is in order: Our classifications "A, R, 
X" are very loose ones and the number of instructors responding 
is quite small. One presumably could find a wide variation of 
opinion about the essays within a single department of an 
institution and a wide one as well among universities or among 
CAATs. The present analysis is inevitably a very tentative one: 


it is suggestive but by no means conclusive. 


Analysis 


For each postsecondary teacher, a separate analysis was made of 
the relationship between the teacher's categorization of the 
essays and the score derived from the 36 original scorers. This 
analysis consisted of two probit regressions, using the essay 
score as the independent factor and either of the following two 


dichotomies as the "response": 


zat 


(i) Acceptable versus Requiring Remediation and Not 


Acceptab le 


(ii) Acceptable and Requiring Remediation versus’ Not 


Acceptable. 


The probit analyses provide functions relating score level to 
probability of response classification, and these functions depend 
on the response level (hardness/softness) of the classifier and 
also on the correlations between the classifiers' responses and 


the original scores. 


In order to see how the postsecondary teachers would have 
classified typical essays from the five basic groups of secondary 
students--that is, S21GD=PUSTSEC, SeotGD-UTHER ? 9 -SSED=SEEe 
SSGD-POSTSEC, SSGD-OTHER--each probit function was combined 
with the distribution function of essay scores for each group. 
(Recall that a distribution function was a quadratic fit to the 
logistic transformation of the cumulative proportion of students 
in a group who were assigned selected essay scores or less. These 
functions and the probit functions were integrated on a 100-point 
grid.) The result was an estimate for each teacher and each 
secondary student group of the percentage of students in each 
group writing essays classified as Acceptable, as Requiring 


Remediation, and as Not Acceptable. 


Fingdly yy =the percentages for “the four “Groupsmeor 
postsecondary teachers--University English, University other, 
CAAT English, CAAT other--were aggregated: the mid means, which 
are “tne “arithmetic “medans = of * the "middie ‘talVes eam thie 


distributions, were calculated. 

Results 

Table Al.9 gives the percentage of essays for each combination of 
postsecondary teacher group and secondary student group estimated 


by means of the procedure described above to fall into each 


category. It is clear that the university teachers are generally 
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harsher in their judgements: for the student group comprised 
predominantly of students going on to university (SSHGD-POSTSEC), 
university instructors would find only about half the essays 
acceptable. It is interesting to note that the university English 
teachers tend to declare the other essays--those that do not fall 
into the Acceptable category--to be Not Acceptable, while the 


other university teachers more often favour remediation. 


The CAAT teachers find a somewhat higher percentage of the 
essays acceptable, though of the group comprised mostly of 
students planning to enter a CAAT (Group SoGD=P0IS1 SEQ)S vanly) 2 1 
per cent wrote essays considered acceptable by CAAT English 
teachers, and 42 per cent wrote essays considered acceptable by 
other CAAT teachers. Remediation is suggested for 55 per cent 
and 48 per cent of SSGD-POSTSEC students, depending on the type 
of CAAT teacher. 


One other result seen in Table Al1.9 is the difference 
between SSHGD and SSGD groups. This is further confirmation of 
group differences described and discussed in Chapter Three of the 


report. 


The method of analysis used here has led to «results having 
the appearance of great consistency. The reader should be warned 
again that the sample of teachers was small and cannot be 
described as a probability sample of any know population. 
Moreover, the variation in judgements was substantial. For 
example, the variation among the 12 university teachers in the 
percentage of SSHGD-POSTSEC essays judged to be acceptable was 
from 5 per cent to 79 per cent. The variation among the 12 CAAT 
instructors in the percentage of SSGD-POSTSEC essays judged to be 
acceptable was from 14 per cent to 53 per cent. For all these 
reasons, it would be foolhardy to make strong statements based on 
these results about’ the acceptability of essays written by 
different groups of secondary students as judged from the point of 


view of university and CAAT instructors. 
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APPENDIX AIA 


SCURENG OF “ESSAYS 


INSTRUCTIONS -10 SCORERS 


Following these “instructions “1s “the ‘page from “the 
examination booklet listing the eight topics and giving 
Lue INStLructions to, the; student. “fhe “last page ie a 
reppeuverion Of Eine instructions from the otbside of the 
beGKlet. INete™ first that occasionally you may find an 
lunruled page ineluded in the essay where the student, 
GWwinGd Ss soON lame constraints, has not had time to re-copy. 
Please note as well that the student has been told to 
make his corrections by neatly stroking and writing above 
the excised material as we did not expect there to be 
time to re-copy. In your impression scoring, therefore, 
ao-noue penalize for this inevitable untidiness. 


You are asked to give your mark on your impression of the 


whole performance. Subtotals for vocabulary, style, etc. 
are not to be used. You are asked to make up your mind 


Guickiy. Keeping UG @e rate of Z0-25>"scrapre per hour. 


Errors in spelling, mechanics and grammar will be 
separately scored for a large sample of the essays at a 
Pawer stage as Part 07 OUr effort to analyze ana deseribe 
the writing of secondary school qraduates. You should not 
therefore concern yourself with any tally of these. 


Look for excellences rather than penalize deficiencies, 
rewarding the writer involved enough to write in a direct 
and expressive way, and detached enough to show a 
eonsastenl point of View. 


The student was invited to agree, disagree, or to take an 
Intermediate position with tespect to the point’ of ‘view 
Stated or implied by the topic; consequently if the 
Student “takes a point “of view or approach at variance 
with the topic statement the essay is not to be 
Sencvdered? “oft opie... 


iver aronarii sot. Matnrougie Un Cl lows LO high) to each 
composition. Please use the whole scale. Scripts you 
consider the best are to receive 10 and scripts you 
consider ihe wore:, 1. The numbers 1 = 10 ‘are wholly 
arbitrary; please do not think in terms of percentages, 
passes, or failures. 
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It is not necessary that you seek to place equal numbers 
of essays in each category of 1 through 10, though you 
should use all ten. 


5. The zero is mot part of the scale. “Record” “a. “zeromecnss, 
where tine student has not attempted to write a 
composition or where there are only fragments of notes. 


6. Record sctoressin; the, box apn the upper Light hand Vconner 
of page 1 following the student number. For the scores 1 
to, 9puse' the dotibleadigit 0) pyO Zee asy O07. Pease euSscane 
pen with. a color different, fromeblack (e.g. red). 


7; Return total package “in “thes original “box “Usmnaqm@eure 
self-adhesive. neaburn address label provided. Use 
registered mail. Please endeavour to have the box in the 
mail by Monday, June 28. 


SUGGESTIONS FP OiweS LABE TS HING 
CATEGORIES AND DEVELOPING A RHYTHM 


In order to establish.a standard. in «your. mind. read ,Z0=25 
scripts selected at random from the whole set you have 
received before you begin to mark. Place these scripts in 10 
different piles representing the scale 1 to 10. After that 
try to make an individual judgement on each piece of work 
accordinges tow thise,. standards. adding Stnes SC°UDt, LO gure 
appropriate pale. 


You may find, as these compositions are the product of both 
general and advanced level twelves as well as_ thirteens, 
that the range,of ability represented is more extensive than 
your sample has indi cated. Tt SO, Levise your 
classifications, Copcever. Lhe: range. Tre—sortings those essays 
already piled. At the point where your standard is firmly 
established in. your mind,..pause to add the scores to those 
essays already appraised. Then continue scoring and piling. 
Try to work) quickly... Madnuainiing) ae rates of) ZU Jone snore 
SOrLD CS Mer Wour. 


Please. .be.alert to_the.problem of boredom, especially where 
you-happen to.Tun.into ‘a. string) Jo! essays. on, tie. ssame 
subject... Lt sis. well. to planabreaks when you are, beconing 
somewhat saturated. 
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INFORMATION 


bach script is’ being scored by three people ‘and their 
perceptions, das represented “by “the score given, ‘will be 
combined to produce the raw score for the essay. 
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APPENDIX A1B 


THE ESSAY ERROR COUNT 


EXTRACTS FROM. INSTRUC IONS TO COUNTERS 


Familiaraze yourself with the error classifications, a few 
of which may appear somewhat arbitrary, and perform a trial 
run, on; a few of the essays vo develop an efficient srinyeine 
Check the errors noted against the classification chart to 
ensure that the instructions are being followed 
consistently. 


Ine classification (of) “errors? ohryethemehert is inevitaba y 
limited and arbutrary<: You will Have to "make a@° number af 
"judgment calls”, and a few errors that perhaps bother you 
especially may not fit any of the classes. For the latter, 
with regrets or, curses,,.igqnore,them.. For the others, soones 
or, Later you, will have Co, be. aromiranvem Onn li OTe ) yOu mam 
sanity, make, it. sooner. JL found thevgreatest difficulty in 
classifying types. of serious sentence errors, for, once; Ja 
student dives ~into a construction heveannoet handle,» there 
are multiple consequences. 


The division allows for 


F - Fragments and unanalyzable sentences 
R - Run on and run together sentences 


K - Seriously awkward, contorted sentences 


and then for some common classes of error related usually 
only to a portion of the sentence: 


MM - Misplaced modifiers other than verbals and verbal 
phrases 
MV - Misplaced, mis-related, or dangling participles, 


gerunds, infinitives 
S - Faulty subordination 


Il = Paulty parallelism or faulty ellipsis afteeting 7a 
large sub-unit of meaning in the sentence 
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Very serious faults are to be classified as well as you’ can 
under the first group. 


I have attempted to cover most contingencies in the 
explanation of the code, but other ambiguous situations will 
inevitably emerge. Choose... what you see as the most 


appropriate classification and carry on. 


This* ‘error count is*being’carried out simultaneously by» two 
people, and matters should balance out by averaging results. 


The essays appear in order #1-#50. This order is unrelated 
to the scores received. 


Procedure 


ie the first Z00 words, omitting the title, have been marked 
Oe owe yOu. 


Your “error count is to include the first 200 words only, 
with the following exceptions: 


If the 200th word is the last word in a sentence, the end 
Dunc cUatbton 1S to be ancluded in the count. 


If the 200th word is not the last word in a sentence, the 
whole sentence should be examined for errors falling 
under the sentence structure classification only, not 
including punctuation except where it contributes to the 
sentence structure error. 


Sample: 200th word It was a wierd§ and wonderfull 
game and which the Dodgers lost 
didn Stethey. 


Count mis-spelling of "weird" but not the mis-spelling of 
“wonderful”. 


Mount the “anduwhichWeerror 4S a omajor error 6(F) in 
sentence structure. 


Dionne count the failure to include .a comma after "Lost" 
or the faidume to .end with a "7" 


2. Mark the count for each error classification (Sp, Ries, 
etc.) in the appropriate box beside the appropriate essay 
number, and give the subtotals in the boxes provided 
headed “Ol. pita the total etron figure in the box to 
the left marked "GRAND TOTAL". The count on fictional 
essay #99 in the first set of boxes should GlaAriiLY. » Chie 
procedure. 


Do not include in your count errors that do not 
approximately fit the categories, and make rapid 
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judgement calls where there is some ambiguity about error 
and category provided. 


For the first 5 or 6 use a pencil to mark the errors on 
the essay, referring frequently to the classification of 
errors material, and record counts on your rough copy of 
tne icehart.. 


Then go back over these essays to determine consistency 
with which you are using the classification making such 
revisions in classification as seem necessary. Use a 
bright coloured pen or marking pencil to score over your 
penciled scoring and to revise your rough counts. 


Continue with the remainder. 
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CODE AND EXPLANATION OF ERRORS 
ERROR COUNTS 


(1) CONVENTIONS 


Spelling 


Count a mis-spelling of the same word only once. 


Accept both American and British spellings: 
penalise-penalize, theater-theatre, etc. 


Ignore obvious slips and failure to break at the 
syllable when a word carries to the next line. 


Chasecatya all Papostrophes, errors sas spelling 
errors. 


Errors, im primeipal “parts. of Werbs sshould be 
elassifiedease errors “Win Pverbs,, “new spelling 
errors. 


Abbreviations DroOperay spelled ates] LOO be 
accepted. This is a question of style, not of 
spelling. 


Punctuation 


The use of comma or no_ punctuation between 
principal clauses that should be separate 
sentences falls under "The Sentence" below. Do 
not classify 7£ under punctuation. 


Av comma before “and” in a ‘series’ is’ “usually 
eptiohnal. "Don't =penalaze: 


For introductory prepositional phrases and short 
subordinate clauses, usage is variable. Don't 
penalize unless the sense of the sentence is 
adversely affected. 


Failure to put commas both before and after a 
parenthetic expression should be penalized as 
one error: 


Example: My father who is bald loves me. 
ere nOd: 

My father who is bald, loves me. 
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Penalize only once the failure to place the 
appropriate punctuation marks inside rather than 
outside quotation marks. (Where the student has 


carelessly placed quotation marks almost exactly 


LOY 


ze) 


above the punctuation mark, give him the benefit 
of the doubt.) 


Failure to put quotation marks both before and 
after the words directly quoted--1 error, not 2. 


Ignore obvious slips on periods. Actually, the 
photocopying process may not have picked up a 
light dot, and a punctuation mark at the end of 
a line may have been missed in the photocopying. 


Treat the dash generously. Mark as a punctuation 
error only Lif elieanlhy) (wrong,e iin oe ee fie 
represents a tendency to excessive informality 
in style. 


Accept titles either Underlined ory im quotation 
marks (not both at once.) 


Capitalization 


Count an error involving the same word only 
once. 


Geographie “locations, book “titles, “and other 
proper names may contain one or more words. 
Count . only ene capitalization error on each 
occasion, as: 


"Drrtish, columbia? 
"Rime Of The Ancient Mariner" 


The conventions of English permit only language 
subjects “to. be “capitalized... co in sengiaene 
Prenc hs Spanish. Count an error in 
capitalization of  4sehaol, subjects “ONCERONEY: 
(Otherwise students selecting topic 4 where’ the 
problem is frequent are being disadvantaged.) 


Note: This convention does not appear to extend 
to school departments as Department of History, 
Mathematics Department. 


SENTENCE STRUCTURE (Classify all very seriously 
faulty sentences under F, R or K.) 


sentence Fragments 
and sentences that defy analysis. 


Run-On Sentences 


Main ideas tagged together with "and", main 
ideas separated only by a comma or by no 
punctuation--"The Comma Splice". 


LOZ 


MM 


MV 


ry 


PP 


(3) 


(4) 


Awkward 


Seriously disjointed, contorted or incoherent 
sentences, not mere lack of smoothness 


Misplaced Modifier 


Word, phrase, or clause (other than _ verbals, 
noted under MV.) 


Verbals 


(infinitives, gerunds, participles) dangling, 
misplaced, misrelated. Do not mark split 
Atel tees. 

Faulty Subordination, 


including orphan vara ine clauses, "when", 
"where", and "because" clauses improperly used 
as nouns. 


e.g. I saw you were not at home which is why I 
broke ins. (Orphan “which) 


The reason I came is because I was lonely. 


Faulty Parallelism 


affecting the sentence. Include faulty ellipsis 
under parallelism errors. 


VERBS 


Subject-verb agreement 


Be tolerant of collective nouns which, depending 
on context, may express a singular or plural 
idea. 


RauwLey svense «OL tense sequence 


Pringapals parts of verbs 


ac uin dicaw',. “has Seen. sory uhave annie yas 
distinct from "have laid". Do not classify these 
under Spel ling™. 


PRONOUNS (Include pronominal adjectives) 
Antecedent 

Lack of an antecedent (omitting orphan "which", 
described under subordination). Ambiguity owing 


to doubt concerning antecedent. Shift from one 
person/number to another in the sequence. °Don't 


penalize the shift back.§ 


(5) 
ES 


(6) 


Rd 


Rg 


DO NOT 
REN AL TZ ese 


Case of Pronoun 


The objective case following a copula verb is 
acceptable: ULe Wacemens 


Madisiesy (Nz eh olal aes. 


Misuse of a part of speech 


€.g-, prepositions as conjunctions, adjectives 
as adverbs 


DICTION 
Diction 


word obviously inappropriate in meaning or in 
accepted, usage, )-ase anount! for "number", 
"between far. “among, ae lcs mand "anywheres", 
Mirregardless", etc. ‘sorts of error (what the 
Grade 13 examiners formerly called "crudities"). 


Redundancy 


Obvious redundancy where the’ student uses in 
close succession words or phrases’ obviously of 
the same meaning. These will be situations in 
which the sense obviously escapes the student. 


Register 


Grossly inappropriate choice of register of 
language. 


A consistently informal breeziness of style is 
not to be penalazed, nor ts =the ‘occasional 
choice of an "inappropriate" word, evidently 
chosen for effect, even kif you are: notpdeased 
with the effect. 


-Sentence fragments written deliberately for 
effect (even if (you “are unimpressed by the 
effect.) 


-oplit “infinitive. Frequently the abuse is 
tolerated and sometimes it is necessary to 


Simplify style. 


-"Due to" when misused. The misuse is now 
Virtually sthe..custom. 


~ltohall sand wi ld scoanfitisaions.. ihe distinc tons 
if. ,thene ever, Was (One, in normal sausage, «is 
dropping out of the language. 


-Contractions such as Ui" TY hasnt.) "don tun 
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etc. One's tolerance or intolerance of these are 
matters of acceptable style, not of error. 


-Objective case following the copula verb: "It 
was me", 


-"Who" rather than "whom" at the beginning of a 
sentence as in "Who(m) were you going with?" 
Established usage. 


-Minor faults in parallelism not affecting the 
sentence «structure. 


Roe hDaneC LIONS : 


When a serious error in structure has_~ another 
minor error (e.g.,+ of punctuation) associated 
with it, treat the event as a single error, the 
more serious. 


e-G., ) uesiaes. this serious, threat serence tas 
been the cause of ...+" 


ie .ernor (lies. in the. lack, ot .cqrammatreal 
relationship of .the opening phrase ta the 
remainder and should be so classified. Ignore 
the failure to provide a comma after the opening 
pnease.s [his Would be effectively to.penalize 
the same problem twice. 


If, however, "Serious" had been mis-spelled, you 
would count both the sentence error and (the 
spelling error, as. they are unrelated. 

Treat the issue of hyphens generously. 

There is such a wide variety of opinion on what 
constitutes a compound word that it would be 
unfair to penalize the student for your 


predilections, however strongly held. 


Lqnore errors in the title. 
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APPENDIX A1C 


INSTRUCTION SHEET FOR ESSAY APPRAISALS 


Background: You are likely aware that, this past spring, a 
study was made of student achievement at Grades 12-13 and of 
the programmes, secondary and postsecondary, in a_ large 
number of educational institutions in Ontario. In fact, you 
may have already been directly involved through responses to 
questionnaires on the tests (OISE) OF on programme 
(Queen's). 


In English, we tested all students in the sample in reading 
comprehension and language, and, for a subsample, we _ also 
provided an essay’ tor be? written in) fo minutes.) mine 
instructions to students and topics are enclosed. The essays 
have each now been holistically scored by at least three 
markers on a 1-10 scale, so we now know how the essays range 
across the scale. Fifty of the essays, those enclosed, have 
been scored by all 36 markers on the marking team so that we 
have been able with these to designate their position on the 
Scale with (conLidence. 


Intent of Present Appraisal: We now need to. characterize 


the scale: “in @several’ ways,> one “ote which “is to obbedn 
information from the post-secondary level from a variety of 
departments as to which essays represent a satisfactory 
level of writing ability for success at post-secondary 
institutions. Certainly, a single sample of writing is 
hardly an adequate basis for classification; nevertheless, 
your appraisal will assist us to describe the state of 
affairs more exactly than the 1-10 scale by itself could do. 


The essays are not in order “by ‘seore, nor is the ‘small 
Sample representative of all the essays received. They are 
numbered from 1-50 as on your return sheet. The other number 
1-8 is not a score; it simply indicates the topic chosen. 


Request: We are asking, through the contact person 
delivering this package, an instructor from each of three 
different departments or divisions within your institution 
to give a brisk reading! to ’these fafty essays and classiny 
them on the return sheet under the headings provided there. 
Recognizing that different characteristics of "literacy" may 
be required for success in different sorts of postsecondary 
programmes, we are soliciting your impressions from. the 
standpoint of programmes represented in your department or 
division rather than in general. 


We have left 1t to ithe contact person im your institution as 


to whom to approach; if for some reason you are unable to 
carry out this assignment for us, please assist him/her in 
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passing the package on to a colleague and in explaining the 
circumstances. 


A rapid reading of the essays is all that is required. You 
do not have to mark anything, make notes in the margin, etc. 
We hope you will not find it a time-consuming task. 


Once your return sheet has been filled in, please _ destroy 
the essays. Send back’ the return sheet only. 


NO, 


TABEESAN aE 


Percentages of Examinees Responding Correctly 
tombach’ TtemivineBoth ormszof they Leste at 


Reading Comprehension and Lanquage 
Achievement (English) 


Dif facultyaindices 


Item Forme) For me2 
No. SSGD SSHGD SSGD SSHGD 
i) 86 pial 61 70 
2 59 vay hZ 83 
3 Spy 3) 45 61 
4 Ze 38 46 60 
5} yi 14 20 27 
6 66 vo 58 Ths 
i, 54 7! 4] DS 
8 40 54 5 50 
9 25 47 18 Z2 
10 18 29 Re) AS) 
tal 76 82 64 ia. 
12 65 yal 62 76 
13 46 64 yD 87 
14 48 pyr 48 61 
LPS) oy py 48 Dl, 
16 Dee 47 40 ye) 
Ld S| 43 4] 50 
18 35 42 Wy Za. 
iPS) Za. 30 a0) 36 
20 tay) oa 6 10 
Dy Dis) a2 10 14 
22 LZ Tea! 65 74 
Lee 64 80 58 66 
24 67 81 64 fap. 
25 83 88 68 hae 
26 De 39 62 65 
201 54 67 oo 46 
28 34 42 44 DS 
29 67 76 26 Di 
30 NS) oo Le} 45 
6] 64 a 16 ig,1\ 
Be 36 Sie Lg 23 
5505) Vive We 44 58 
Bilt 26 32 46 58 
330) 65 67 58 69 
36 40 38 50 Ste) 
Mean LG. Duties) DF) Did 
Se) 2? 20.0 18%6 20.8 
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= oe z ce si he x 72 61 
z L2 = BC + cl a val 8T 
3 £G = 77¢ Ps OT z GL jah 
7 a 5 St = ‘ ee a oC 24 
- a 7 cl is S = 8 ae 
= ra! GT = 7 zi S val 
= cA = a = S F 8 a | 
6 a ¢ = (oat = all Gl 
2 S = 7 ie if 0 RE 
= il 5 Le i 61 = 92 OT 
= 61 = CC = gc = ie 6 
= GT = OZ Ti 61 =a GC 8 
. jie x 71 se 8 zi ra | L 
re L ‘iia CA = 7] zs ) 9 
= OT r ol = 7 a 7 S 
Ss ¢ : ? 5 7 ms 7 V7] 
= V7] = S = C = ¢ ¢ 
a if 2 T = vA = 7 cA 
az i = T = 0 . 0) t 
payoeasy pay ytwg% peysesy PazzTUQy payseay paz yTwgy% peyvesy Paz TWO’ ON 
FON % FON % FON % FON % wey] 
G3HSS GIS GIHSS G9SS 
Z wi0 4 T wao 4 


(USTTDUq) JUaWaraTYyoy abenbue | 
pue uotTSuayaiduoyj Dutpeay jo 4sa]| 
aui JO swaio4y yyOog UT Swaqy BuTYyaeay 
JON pue Bbutz4twg sasauTwexyz jo abejuaoiag 


Cot ead evel 
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TABLE. Alec 
Indices of Discrimination for Items in Both Forms 
of the Test of Reading Comprehension and 
Language Achievement (English) 


Rommel Forme 2 


Item No (Ned6l2), (Navel) 
1 46 49 
2 53 55 
3 71 51 
4 49 44 
5 38 34 
6 55 59 
7 63 55 
8 54 62 
9 69 38 

10 64 58 
fh 44 64 
iz 38 65 
ilps 57 GT 
14 54 57 
15 all 55 
16 61 52 
ig? 60 53 
18 41 63 
19 52 54 
20 55 48 
we 56 56 
12 41 52 
Jods 65 42 
24 63 29 
25 5] 43 
76 39 SH 
vid ae 50 
28 29 48 
29 36 38 
30 57 36 
al 54 a6 
32 65 48 
33 49 54 
34 42 52 
35 29 52 
AG 33 43 
Mean Dil 50 
5.0 11 10 


TABLE Al1l.3 (continued) 


Note: Decimal points have been omitted from the 
table. 


“This group of SSGD and SSHGD students took both 
forms of the test in order Form 1l--Form 2. 

DT ae group of SSGD and SSHGD students took both 
forms of the test in order Form 2--Form 1. 


TABLE Al. 4 


Statistics Describing Both Forms of the 


hes vo fi Reading Comprehension and Language 
Achievement (English) 


Form 1 Borm:c 
Number of Examinees Ler29 1619 
Number of Items 36 36 
Mean Score M6z49 L445 
Standard Deviations Vento We 
Highest Score 56.00 56.00 
Lowest Scorg Ba vate amet 
Reliability d Oats be’ OF 82 
Standard Error of Measurement S205 Bia OS) 


8These examinees took both forms of the test in the 
order Form l--Form 2. 
OThose examinees took both forms of the test wT) Vein 
order Form 2--Form 1. 


Cc ; . 
Negative scores arise as’ a consequence. of ,sthe 
applicetionwof<ascorrection for quessings;lingthis.case, 
each wrong answer was- scored (-1/4), whereas’ each 
correct answer was scored (+l). 


ocomputed bysapplyingea formula due to Hoyt (1941). 


TABEE Ai > 


Correlations Among Scores on the Subtests and 
Total Test of Both Fors “of the shest or Reading 


Comprehension and Language Achievement (English) 


LestePart a v4 2 4 
Reading Comprehension 
Subtest Dip ot 80 
First Language Achievement 
Subtest . 56 2D 83 
Second Language Achievement 
subtest 56 55 86 


Tovalm lest 82 84 86 


—_—_—_—_—_—_———_———::::.?.?.:.?. a 


Note: Decimal points have been -omittéed. Correlations 
below the diagonal are for Form 13; correlations above the 
diagonal are for Form 2. The sample size on which the 
correlations below the diagonal are based was 1612; for 
the correlations above the diagonal, the sample size was 
Tower. 
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*Butsstw sem ai0as 
3u} YOTUM IO} UOTSeI90 3ayy pue IayYIeW |ayy *‘AeSsa ayy IO} SzIaJJa 
uTeW 394 pue UueaW pueIH ayy jo wns ayy Aq pazewTysa sem air0IS 
Butsstw yoea faouetieA jo stsATeue ayy UT, *paioosunN pauinj}ad sem 
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APPENDIX A2 


TECHNICAL REPORT ON’ THE TESTS UF FRENCH FOR PRANCOPHONES 


This report is divided into three main sections. The first section 
consists of a description of the Test de compréhension en lecture et 
de connaissance de la langue (frangais) and the Test de composition 
écrite. The results of an appraisal of these tests are reported in 
the second section. In the third and last section, information about 


the technical properties of the tests is given. 


Pee Oe SG IPIION “OF WEST CON LENT 


The two forms of the Test de compréhension en lecture et de 
connaissance de la langue (frangais) were produced by taking two 
parts of the Test de francais, langue d'enseignement (the first 
and the fourth), and dividing them so as to produce two 
"equivalent" test forms, each containing 45 multiple-choice 
items. Each form of the Test de compréhension en lecture et de 
connaissance de la langue (frangais) was divided into two parts. 
The first part tested reading comprehension by means of two 
reading passages, each approximately 500 words in length. 
Students were asked to answer several multiple-choice questions 
about each passage. These questions involved different types of 
comprehension: literal understanding of a text; identification of 
the main idea or objective of an author; and ability to draw 
conclusions from, and see implications in, what an author says. 
Although both forms of the test contained two reading passages, 
Form 1 contained eleven reading comprehension questions, whereas 


Form 2 contained fourteen. 
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The second part of the Test de compréhension en lecture et 
de connaissance de la langue (frangais) was made up of a variety 


of exercises testing language achievement. In one type of 
exercise, the student was given four sentences. His task was to 
read the sentences and either identify the one containing an error 
in grammar or indicate that none of the four sentences was in 
error. form 1 of the test contained eight items of this type, 


Form 2 contained seven. 


A second language achievement exercise required the student 
to read a sentence, and, if he decided that it contained an 


error, to classify that error under one of four headings: 


(a) an error in a noun, adjective or article, 


(b) an error in a pronoun, 


(c) an serror, im a everb., 


(d) an error in an adverb, preposition or conjunction. 


If the student concluded that the sentence was not in error, then 
he chose the fifth response option for the item. Each form of the 


test contained five items of this type. 


In the final type of language achievement exercise, the 
student was given an incomplete sentence and asked to choose (from 
five pairs of words) the pair which best completed the sentence. 
Form 1 of the test contained eleven of these sentence completion 


items while Form 2 had only nine. 


The Test de composition écrite was a direct translation of 
the Writing Test developed for administration to Anglophone 
students. The test offered eight topics, and asked the student to 
write an expository essay on any one. The student was assured 
that he could adopt a point of view that differed from or agreed 
with the point of view expressed in the topic statement, and that 


his primary concern should be to argue his position logically. A 


more detailed description of this test is contained in Appendi x 


Al, which deals with the English language version. 


TEST APPRAISAL 


The tests of competence in the use of French by Francophones were 
submitted for appraisal to teachers of Grade Twelve and Grade 
Thirteen courses in the Francophone schools in the study. In all, 
34 teachers did the appraisal task. Some responded from the point 
of view of Grade Thirteen frangais courses, others from the 
points of view afforded by Grade Twelve advanced level francais 
courses, Grade Twelve general level francais courses, and Grade 
Twelve basic level frangais courses. Some individuals responded 
for courses at more than one level because they were teaching 
courses at more than one level. As a consequence, the number of 
appraisals for courses at any one level is smaller than 34--for 
Grade Twelve basic level courses, the maximum number. of 
responses to an item was only five--although the total number of 


appraisals across all course levels is larger than 34. 


The appraisals must be interpreted in light of the nature of 
the sample that the 34 teachers comprise. These 34 appraisers 
were teaching in 13 of the 14 Francophone schools in the study. 
No test appraisals were done by the teachers in one school. The 
14 schools are themselves representative of the population of 
Francophone schools in the province inasmuch as they constitute a 
probability sample of the population. This implies that the 
teachers who appraised the test are reoresentative of the 
population of frang@ais teachers in the province. But —a 
qualification arises because of the way the test appraisals were 
distributed. Information was received from each school prior to 
the administration of the tests about the number of persons in 
the school who taught francais courses at the Grade Twelve and 
Grade Thirteen levels. A corresponding number of test appraisal 
"kits" was sent to the school. But not all the appraisal forms 


that were sent were completed and it was not ascertained whether 


PZ 


this was because some appraisers had refused to do the task, as 
they obviously had in at least one school, or because incorrect 
information on the number of frangais teachers had been supplied. 
For these reasons, it is probably not reasonable to consider the 
sample of 34 responding teachers as fully representative of the 
frangais teachers in the province, although it is relatively so. 
In any event, generalizing from this sample to a population is 
hazardous because of the relatively large sampling error that must 


be associated with such a small number of schools and teachers. 


No appraisals were obtained from individuals who were 
teaching frangais courses at the first year level in a CAAT 
college or a university. An appraisal inventory was developed for 
instructors at this level and it was distributed to a small 


number of persons but no completed inventories were returned. 


The format of the appraisal inventory for the francais tests 
was essentially the same as that for the English tests (see 
Appendix Al). The first 19 questions of the inventory were 
directed toward the reading comprehension part of the two forms 
of the Test de compréhension en lecture et de connaissance de la 
langue (frangais). The next 10 questions focussed on the language 
achievement exercises in the same test. The final 11 questions 
dealt with the Test de composition écrite. Most questions were 
not directed toward particular test items. Instead, they asked 
the appraiser to consider the language skills that were assessed 
by the tests, the? difficulty of different parts of the tests, “and 
the usefulness of the particular testing format employed in the 
tests. Appraisers were given space to comment in writing at 


various points throughout the inventory. 


The main results of interest as regards the test appraisals 
are contained ) an “able "AZ 21: This’ "table consists ‘of tlic 
questions in the inventory, the associated response options and 
the frequency of response to each option for each level of 


course. 


A detailed discussion of Table A2.1 is not provided. Given 
the full discussion of results from the appraisal of the tests of 
English language competence for Anglophones, and given that the 
picture provided by Table A2.1, in conjunction with the written 
comments of the appraisers, is similar to that provided by the 
Opinions of the secondary school appraisers of the English tests, 
it would be redundant to repeat the same points here. A _ brief 
summary of the conclusions supported by the appraisals of the 


frangais tests seems more appropriate: 


(a) The language objectives. examined by the tests--reading 
comprehension, language achievement and writing--were 


seen as important. 


(b) Although there was support for the use of a 
multiple-choice testing format, the consensus was that 
this kind of test must be supplemented by samples of 


written work. 


(c) The level of difficulty of the tests was judged to be 
best suited to students in Grade Thirteen and Grade 
Twelve advanced level courses, and least appropriate 
for students in Grade Twelve basic level.courses. The 
tests were judged to be excessively difficult for the 


latter group. 


TECHNICAL, ISS ues 


je eOCOringd tiem Lect de, comprenensiomoem 1eoture pel .de 
connaissance de la langue (frangais) 


As indicated in the instructions to students writing this test, 
the standard correction for guessing was applied. The 
multiple-choice items in this test each contained five response 


options; the standard correction (Lord and Novick, 1968, oo U6!) 


a) 


for such items is to assign incorrect answers the weight (-1/4); 


correct answers are weighted 1 and omitted questions are ignored. 


The two forms of this test did not possess equivalent 
scales, at least in part because of differences in difficulty and 
reliability. (These differences are described in greater detail in 
subsequent parts of this section of Appendix A2.) To compensate 
for these differences, the scores on Form 2 of the test were 
equated.:to» the scale, of «scores son, shorn. Gf, othe stestaew his 
equation was made using the scores of those students who took 
both forms of the test and the procedure outlined in Appendix Dl. 
Equated scores were used in the studies involving this test, the 


results of which are described in the main part of this report. 


2.24 coring the lest de ‘compost vom serie 


All essays but 50 were scored by three different markers. The 50 
essays singled out for special treatment were all scored by all 
nine markers. The marks on this special set of 50 were used to 
derive equations for adjusting the scores assigned by different 
markers. Adjustments were necessary to remove differences among 
markers in the average mark assigned, the range of marks assigned 
and the reliability of marking. A single score was obtained for 
each essay vy entering the different marks assigned to it--three 
or nine--into an equating formula, the specific form of which 
varied with the number of marks and, in the case of all essays 
only scored three times, with the particular individuals who 
marked the essay. The procedure for equating essay marks is 
described in Appendix D2. All results on these essays that are 
presented in the main part of this report were derived using the 


equated scores. 


3.3 Difficulty of items _in_ the Test de compréhension en_ lecture 
et de connaissance de la lanque (frangais) 


The difficulty of each item in each form of the test, as indexed 
by the average across schools of the percentage of students 
responding correctly in each Francophone school, is reported in 
Table A2.2. Percentages are reported separately for Grade Twelve 
and Grade Thirteen. Given the sampling design of the study, these 
percentages may be interpreted as estimates for the population of 
Francophone students defined in this study. (For a definition of 


this population, see Chapter Two of this report.) 


The mean difficulty indices reported in Table A2.2 provide 
clear support for two conclusions: Form 2 of the test was 
considerably easier than Form 1, and on the average, Grade 
Thirteen students performed better than Grade Twelve students. 
With respect to this latter conclusion, an item-by-item study 
reveals only two items in each form for which the performance of 
Grade Twelve students was as- good as or better than the 
performance of Grade Thirteen students. (See the percentages 
reported in Table A2.2 for items 17 and 18 of Form 1 and items 
1 and 2 of Form 2.) (A breakdown of Grade Twelve results, in 
which students are divided according to their educational plans, 
is presented and discussed in the main report.) These conclusions 
are supported by another view of item difficulty, the one 


presented by the frequency distributions in Table A2.3. 


It will be recalled that the Test de compréhension en 
lecture et de connaissance de la langue (frangais) was divided 
into three parts and contained four different types of items. The 
mean and the standard deviation of the difficulty indices for each 
item type are reported in Table A2.4. It can be seen from an 
inspection of the means that the reading comprehension items in 
the two forms were approximately equivalent in difficulty. Also, 
the Type 2 language achievement items in Form tL. were 
approximately equal in difficulty to those in Form 2. (Lype 2 
items consisted of a sentence; the student was to read it, decide 


whether or not it contained an error, and classify the error, if 


diz 


any, under one of four headings.) The main differences’ in 
difficulty between the forms are seen to occur with respect to 
the language achievement items of Types 1 and 3. (Type 1 items 
consisted of four sentences; students were to read the sentences 
and select the one, if any, containing a grammatical error. Type 
3 items consisted of incomplete sentences that the student was to 


complete by selecting the most appropriate pair of words.) 


There “are “obvious: diffierences Sine the dif ficultysy on etme 
different types of items. Regardless of test form and grade 
level, the reading comprehension items were the easiest of all. 
Among the language achievement items, those of Type 2 were 


easiest in Form 1, those of Type 3 were easiest in Form 2. 


The main conclusion that can be drawn from this study of 
the :ciPticultyo‘of Mites soine taces este cemcomprenens1on ch sLeCuure 
et de connaissance de la langue (francais) is that the two forms 
differed appreciably in difficulty. Moreover, the two forms, 
taken together, formed a test that was more difficult than is 
desirable for the Grade Twelve students as a whole, given that 
the test was intended to spread students out as much as possible 
in terms of their performance. (For a test composed of 
five-option multiple-choice items, an average item difficulty of 
somewhat higher than 0.60 would be ideal, if the test is’ to 
discriminate well among students and scores are to range from 
neam t-chances Level’, Prive. Z0' Yoers cent icorrect, to™ near Gpenmtecm 
i.e. 100 per cent correct.) By this same criterion, the test was 


on the difficult side even for the Grade Thirteen students. 


Jude opeededness ahi them lesv sdesconprenensson Jenn svecllire Beumrae 
connaissance de la langue (frangais) 


The available information concerning speededness is contained in 
Tables A2.5 and A2.6. Before this information is considered, 
however, an aspect of the test administration that bears on 
speededness needs to be described. Each form of the test was 


printed in two parts. The first part contained the reading 


comprehension passages and related items; the second part 
contained the three different types of language achievement 
exercises. The time allowed for the test was 40 minutes, but 
this time was divided into two 20-minute periods. Students who 
had not completed the reading comprehension part of the test by 
the end of the first 20-minute interval were instructed to go on 
to the language achievement part of the test. Students who 
finished the reading comprehension part in less than 20 minutes 
could go on to the language achievement part and students could go 
back over any part of the test, time permitting, once they had 
finished the language achievement part. In Form 1, the break 
between reading comprehension and language achievement occurred 
between the eleventh and twelfth items. In Form 2, the break 


occurred between the fourteenth and fifteenth items. 


Table A2.5 is a presentation of “not reached" statistics. 
An item is judged to be not reached if the student fails to 
respond to it and to all succeeding items in the test. The 
figures in Table A2.5 suggest that Form 2 was less speeded than 
Form 1 because, in general, the percentage of students failing to 
reach a given item in Form 2 is less than the percentage failing 
to, reach the corresponding item in Form 1. The major portion of 
this difference in speededness seems attributable to the different 
lengths of the language achievement parts of each test form. 
Recall that the language achievement part of Form 1 contained 24 
items whereas the corresponding part of Form 2 contained only 21 
items. Because the instruction to go on to the second part of the 
test came three items earlier for students working Form 1 than 
forpstudents working Form 32 it .can.be> argued .that.. the. fairest 


comparison of not reached percentages would involve item 'n', say, 
Creators) stand stem’ neo vot Form 2. When, this. 1s “done, the 


difference in not reached statistics is relatively small. 


A somewhat arbitrary approach to assessing the speededness 
of a test is to consider both the percentage of students who 
finish three-fourths of the items and the number of items reached 
by at least four-fifths of the group. A test is said, by one rule 
of thumb, to be speeded if less than 100 per cent of the students 


Way | 


reach the three-quarter mark of the test or if less than 80 per 
cent of the students complete the test. The three-quarter mark of 
each form (27 items) was reached by 96 per cent or more of the 
students in each grade. More than 80 per cent of the students 
completed Form 2, and although a smaller percentage than 80 
completed Form 1, if attention is focussed on item 32, the point 
in Form 1 of fair comparison with the end of Form 2, it will be 


seen that more than 80 per cent of students reached this item. 


A final observation “about "lable “Az > Vconcerns=iie 
differences between Grade Twelve and Grade Thirteen. In general, 
the between-grade difference in the percentage of students 
reaching a given item is small--five percent or less. Moreover, 
these differences on Form 1 favour neither grade level; for Form 
2, the differences lie consistently in the favour of Grade 


Thirteen. 


The evidence in Table NOSE although it pertains to each 
test form as a whole, is most indicative of the speededness of 
the language achievement parts of the test forms. Information 
that reveals something about the speededness of the reading 
comprehension parts is contained in Table A2.6. The numbers 
reported in this table are the percentages of students who failed 
to respond to a given item, although they did respond to a 
subsequent item in the test. It can be seen that the number of 
omissions increases up to the break point between the reading 
comprehension and language achievement parts of a test form. This 
increase almost certainly reflects the effect of speededness in 
the reading comprehension parts of the test forms, although it 
must, be “recognized “that” omissions due to tamlure™ toe identity. 2 
correct answer are confounded in these figures with omissions due 
to lack of time. Applying the rule of thumb stated earlier, the 
reading comprehension parts of the test forms are judged to be 


only very slightly speeded. 


Two conclusions can be drawn from the data on speededness 


that has been presented: 
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(a) Both forms of the Test de compréhension en lecture et 
de connaissance de la lanque (frangais) were somewhat 
speeded, Form 1 being slightly more so than Form 2. 
This difference between the forms is, for the most 
part, explainable by the fact that the break between 
reading comprehension and language achievement came 


earlier in Form 1 than Form 2. 


(b) There was very little difference in the degree of 
speededness of these tests for Grade Twelve students as 
compared with Grade Thirteen students. What small 
difference there was tends to indicate that the Grade 
Thirteen students worked slightly faster than the Grade 


Twelve students, particularly on Form 2. 


jo ltemediserimination in the Test .de compréhension en. lecture 
et_de connaissance de la langue (frangais) 


A crude measure of item discrimination is the biserial 
correlation coefficient between scores on an item (simply 1 for 
correct and O for wrong or omitted) and scores on the total test. 
In tests designed to spread examinees over the range of possible 
scores on the test, it is desirable to have relatively high and 
positive item-total biserial correlation indices, say 0.3 or 


higher. 


The biserial correlation coefficients for each item in both 
forms of the Test de compréhension en lecture et._de connaissance 
de _la langue (francais) were computed using the test data from 
two randomly selected subsamples of the total sample of 
Francophone students. These groups comprised both SSGD and SSHGD 
students. The resulting correlations, which we_- shall call 
"discrimination indices," are reported in Table A2.7. (Note that 
decimal points have been omitted from the table.) Only the 
indices for three items in Form 1 and two in Form 2 fail to 
exceed the benchmark figure of 0.3. Generally speaking, then, 


the items’ in these two test forms may be said to discriminate 
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adequately among the examinees in the two groups that were 
studied. 


3.6 Distribution and Reliability Statistics for the Test de 


comprehension en lecture et de connaissance de la lanque 
(francais) 


some additional information about the two forms of the test is 
reported in Table A2.8 and A2.9. It is clear from some of the 
figures reported in Table A2.8 that both tests were relatively 
difficult for these students and that Form 1 was more difficult 
than Form 2. No student achieved the maximum possible mark on 


either form and the mean scores were exceptionally low. 


(The statement about the comparative difficulty of the two 
forms follows, despite the fact that the results for each form 
are based on different groups of students, because the groups were 
equivalent in the sense that they were formed by random 


assignment of students from the same pool of students.) 


Ihe -index of “reltabiinty and vthe. standard serrorent 
measurement are measures of the degree to which test scores might 
be expected to remain stable over repeated applications of the 
Same or equivalent tests to the same student. The reliability 
coefficients for these test forms are relatively low; this is 
due, no doubt, in large part to the heterogeneous nature of the 
content of the tests. (The correlations among the parts of the 
test form, as reported in Table A2.9, are only moderately high.) 
Another reason that the reliability coefficients are low is that 
the test forms were on the difficult side for Ontario Francophone 
students; hence scores on the test were not dispersed over the 
full range of the test score scale as they would be in a test at 
a more suitable level of difficulty. Despite the relatively low 
reliability of the test forms and their relatively high standard 
errors of measurement, the reliability of both instruments is 
adequate for making the kind of group comparisons and regression 


analyses demanded in this study. 


3.7 Reliability of Scores on the Test de composition écrite 


The procedure used to estimate the reliability of essay scoring is 
described in the section of Appendix Al entitled "TECHNICAL 
ISSUES ABOUT THE TESTS". Readers interested in a description of 


the procedure are referred to Appendix Al. 


The figures needed to estimate the reliability of the 
marking of the francais essays are reported in Table A2.10. The 
remarks that could be made about these results parallel those that 


are made in Appendix Al for the English essays. 


Application of the procedure described in Appendix Al leads 
to an estimate of reliability of 0.77. The corresponding standard 
error of measurement is 0.72. Both these figures support the 
conclusion that the scoring of francais essays was reliable enough 
for the purpose to which the scores were put--comparing different 


groups of students and predicting school grades. 


For further information concerning the interpretation that 
can be given to the coefficient of reliability and the standard 
error of measurement, the reader is again referred to Appendix 
Al. The qualifying remarks found there apply with equal validity 
to the scoring of the Test de composition écrite. 


Lede 
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TABLE A2.1 
A FREQUENCY TABULATION OF RESPONSES TO THE TEST-INVENTAIRE ESTIMATIF 
: FOR THE TEST DE COMPREHENSION EN LECTURE ET DE CONNAISSANCE 
DE LA LANGUE (FRANCAIS) AND THE TEST DE COMPOSITION ECRITE 


(NUMBER OF RESPONDENTS - 34)* 


12éme année 


13éme | niveau } niveau niveau 
année | avancé | général | de base 


COMPREHENSION EN LECTURE 


Cette section de 1'inventaire 

a trait a la premiére partie 
des deux formules correspondant 
du test de compréhension en 
lecture et de connaissance de 
la langue. 


1. Considérez-vous qu'un test 
de compréhension en lecture 
--pas nécessairement celui 
utilisé ici--détermine une 
composante importante de 
la connaissance de la 
langue 4 ce niveau? 


le) ee OUT aye eva ete siatapatstele tele ois woke 


* See text for explanation of 
the number of respondents. 


Leo 


TABLE A2.1 (Continued) 


Questions 2 a4 6: Pour ces 


questions utilisez les 
codes-réponses ci-dessous: 


(1) 
(2) 
(3) 


trop facile 
facile 

bien choisi 
difficile 


trop difficile 


Si l'on considére les 

deux formules du test, 

il y a quatre textes 
relatifs a la compréhension 
en lecture. Quelle est 
votre opinion générale 
quant au niveau de 
difficulté de ces quatre 
textes? 


Déterminez la difficulté 

de chaque texte séparément 

en fonction des codes-réponses 
ecités plus haut. 


She 


Text sur 1'exploration 
lunaire 


Texte sur la langue 
francaise 


Texte sur la société 
de consommation 


12éme année 


niveau | niveau niveau 
avancé | général | de base 


Response Codes 


Cal CoS) 


mOFwohnd ee mF won & 


maF wre 
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TABLE A2.1 (Continued) 


oO 12éme année 
60 

ag 

a) oa niveau | niveau niveau 
n - cat he 

a 5 avancée | général | de base 
fo) 

2 

fom) 

ow 

A. A, 

id) 

a @ 

wo 
n 
a 


6. Texte sur le camp de 
concentration 


anFwNnN 


7. Ces quatre textes sont-ils 
représentatifs de ce que 
vous attendiez des 
étudiants de ce niveau 
quant a la compréhension 
en lecture? 


(Cle) OUR eres tr chcteca sere ae sn cetera ee 
Coe NON ctr ote dr ars coer a byedece odie 
Questions 8 4 11: L'un des 
textes vous semble-t-il ne 
pas convenir pour des raisons 
autres que celle de la 


difficulté? 


8. Texte sur 1'exploration 
lunaire 


re Weretats ecto cretapareis. vsistelsrene’s oe 

(2) INKO IN IeAcnetenicl Gich Chcieicurar Re nmceerar ain 
9. Texte sur la langue 

francaise 

CP POU irate ce epters crete sce esvate sa 


(2) NOMNietcthicrrueeicce alert eracee es eleven 


as 3) 


TABLE A2.1 


10. Texte sur la société 
de consommation 
(1) Oia. eeeoveeee¢ees. eee ° e . 
(2) NON.. eoeervee ee . ereere eee 
11. Texte sur le camp 


de concentration 


CE) VOULS Acme: 


(2) 


En général, les rubriques sont 
destinées 4 tester les 
compétences de 1'étudiant dans 


les domaines suivants: 


I. compréhension littérale 
du texte 
II. identification de l'idée 
principale ou des 
objectifs de l'auteur 
III. conclusions ou 


implications 


Les questions 12 4 14 sont 


relatives au probléme de savoir 
si les @tudiants devraient 
posséder ces trois compétences 
avant de commencer les cours 

de frangais de chaque niveau. 
Sélectionnez vos réponses 4 

ces questions parmi les options 
Suivantes: 


(1) tous les étudiants 

(2) plus de 75% des étudiants, 
mais pas tous 

(3) de 51% a 75% des étudiants 

(4) de 26% a 50% des étudiants 

(5) un étudiant ou plus, mais 
moins de 26% 

(6) aucun 


(Continued) 


niveau 
avancée 


Response Codes 
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12éme année 


niveau 
général | de base 


TABLE A2.1 (Continued) 


13éme | niveau 
année | avancé 


12. Combien parmi les étudiants 
commengant les cours 4 ce 
niveau devraient pouvoir 
comprendre littéralement 

un texte? 


13. Combien d'étudiants 
devraient pouvoir 
identifier l'idée prin- 
cipale ou les objectifs 
de 1'auteur? 


14. Combien d'étudiants 
devraient pouvoir tirer 
des conclusions ou 
dégager des implications? 


Les questions 15 a 17 ont 
trait a une évaluation de 
l'attention accordée a 
chacune des compétences 

en lecture dans les cours 

que vous enseignez a4 chaque 
niveau. Sélectionnez votre 
réponse a ces questions en 
choisissant parmi les options 
suivantes: 


(1) attention trés marquée 
(2) attention marquée 
(3) attention modérée 
(4) attention particuliére 

a un étudiant plus faible 
(5) attention nulle 


Response Codes 
DAuFrwn+ (see previous page) 


(ops (on) ee yeep NS) A 


copy (On) ee Keay INS) 1 


Response Codes 
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12éme année 


niveau 
général 


niveau 
de base 


TABLE A2.1 (Continued) 


Gy 12éme année 
6D 

no 

J ™! 13&me | niveau | niveau 

S 3] année | avancé général | de base 
3 

ape 

n p> 

eo 

0 4 

a a 

w 

Yo 

mo 
wn 
ww 


Quelle attention 
accordez-vous en lecture 


ana compréhension 


littérale du texte? 


moFwhNnd ee 


Quelle attention 
accordez-vous en lecture 
a l'identification de 


l'idée principale ou des 


objectifs de l'auteur? 


aFwWN Ye 


Quelle attention 
accordez-vous en lecture 
au pouvoir de tirer des 
conclusions au de dégager 
des implications? 


aAFwoNnr He 


Y a-t-il des compétences 
importantes en lecture 
qui n'ont pas été testées 
dans ces tests mais qui 
auraient dt 1'étre? 


CUROU I. i peseteey os Se Ae etn 


(2) NON ce Peer to ete ee toy ae 


La formule de questions 4 
choix multiple est-elle 

une méthode bien fondée 

pour analyser au moins les 
trois compétences en lecture 
mentionnées ci-dessus? 


(1) OU Le sare eas Seats ee Ne 


(2) *OUIS Ceond#tionnel) eee. oo: 


CDEAINON ies h de rae cetes ie ae 
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vp ae 


23. 


aa 


Zo 


TABLE A2,1 (Continued) 


Quelle est votre évaluation 
en ce qui concerne les 
éléments d'identification 


d'une phrase fautive? 


Quelle est votre évaluation 
en ce qui concerne les 
éléments d'identification 
et de classification d'une 
erreur? 


Quelle est votre évaluation 
en ce qui concerne les 
éléments de phrases a 


compléter? 


Quelle est votre évaluation 
d'ensemble quant au niveau 
de difficulté de la partie 
des tests ayant trait 4 la 
connaissance de la langue? 


GE) ert r Opa baile ste craic sise e 
C2 SPaACd lorie sclera sets siete s, oie 
(3) bien .choist 2-2. < anh. 
C2) me ao Gia Lue aa ailsiats lavedere fs 
(a) sEGop das tier. a. of ce 


“om 
ce) 
jot) 

no 0 
vo A 
ue) 
on 
Slr) 
° 
YM sd 
on > 
ao 
Ow 
aa 
103) 
CR) 
mam ov 
1) 
VY 


Fwh ee FwWN e 


PROS ROR [3 


ia 


12éme année 


niveau |niveau niveau 
général | de base 


TABLE A2.1 (Continued) 


CONNAISSANCE DE LA LANGUE 12éme année 


13éme | niveau | niveau 
année |} avancé | général | de base 


20. Pensez-vous que la formule 
de questions a choix 
multiple soit, en général, 
satisfaisante pour analyser 
les compétences des 
étudiants quant a la langue? 


ata) OU ters) ateer eee ea pa ere yn ee 
(2) OUP ‘(ondifionnelnee ee 
(3) NON.. eoeeeoe @oeeeee eee ee see ee 
21. Lorsque la formule de 
questions a4 choix multiple 


est utilisée, devrait-elle 
étre complétée par autre 


chose? 
CL) OUT Breet eate Ruhetetee (cet wieteye 
(2) NON eeee#8e%e¢ O18) 8) 0.8.8 O50 OF 0, 00. 68. '6..6: 6 


Questions 22 4 24: En termes de 
convenance et de difficulté en 
tant que mesures de compétence 
linguistique, comment 
évalueriez-vous les éléments de: 
"identification d'une phrase 
fautive", "identification et 
classification d'une erreur" et 
"phrases 4 compléter" qui 
apparaissent dans ces tests? 
Sélectionnez votre réponse 4 ces 
questions en choisissant parmi 
les options suivantes: 


(1) appropriés mais trop 
faciles 

(2) tout a fait appropriés 

(3) appropriés mais trop 
difficiles 

(4) inappropriés pour des 
raisons autres que la 
difficulté 


Response Codes 


140 


26. 


ORE 


TABLE A2.1 (Continued) 


niveau 
avancé 


Quelle est votre évaluation 
des éléments des tests 
relatifs a la connaissance 
de la langue du point de 

vue de l'accent mis sur 
l'usage, le style, la 
grammaire, la structure 

des phrases, les expressions 
idiomatiques? 


(1) ils fournissent un bon 
équilibre en testant 
des secteurs importants 
de la connaissance de 
NEAT Stele a ahiece) «oi ae) sei «saa: hal 


(2) bien que testant un 
certain nombre de 
secteurs importants, 
l'accent n'est pas 
mis de facon 
équilibrée sur les 
différents @léments....... 


(3) des secteurs importants 
de la connaissance de 
la langue sont omis 
ou testés trop 
superficiellement......... 


Combien d'étudiants devraient 
avoir les compétences 
estimées dans les tests de 
connaissance de la langue 

en entrant dans un cours 

de niveau donné? 


CL FEOUSE Re A iiecete le ae ss.3 Spl eyes 
(2) plus de 75%, mais pas 

OL Soereh ote ai ey ckerobeichetetenele| evel eiere 
G3) meee 5 1 aa) Ares wake ator nenens ores e ts 
Cie dS 267, ap 507 ciate tela clare ere at 
(5) un ou plus, mais moins 

GETZO He oicie cecncheney sunicl a aranecrs 
C6) ee PAU CUD rere ly cio soueiel ot oom oa evel 
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12éme année 


niveau 
de base 


niveau 
général 


TABLE A2.1 (Continued) 


12éme année 


13éme | niveau | niveau | niveau 
année | avancé | général] de base 


28. Quelle attention accordez-vous 
au développement des 
compétences requises pour les 
tests de connaissance de la 
langue? 


(1) attention nulle--les 

compétences requises 

sont sans) dnteretn.. sa. ss 
(2) attention trés 

ibep ote OV AA Or oie CR Bc 
(3) 0 wat tent lousmarquea ss me. 
(4) attention. moderee. ........ 
(5) attention particuliére 

a un étudiant plus 

Pad bilie stan sheictets lee sieve cs ohens 
(6) attention nulle--les 

compétences requises 

Sont, tropravancées. .... «<i. 


29. Combien d'étudiants qui 
terminent avec succés les 
cours de francais 4 ce 
niveau devraient avoir les 
compétences é@valuées par 
les tests de connaissance 
de la langue? 


GIB OUS <) ee tateteens fos erecey eidiale ms erebers 
(2) eiplus) des 5Z.umaisy pas 

OUI oO OOo OOOO bd On Odo° Gd 
G3) ie Teme CS ie tes. oie re Seekste 
CA) Same erk? O/5 Bea Ohio etats wie ois ershclre 
(5) un ou plus, mais moins 

ASR Osler tiit, OPE s sie sale obs 
CGN) MOWAULCUIN ap. mares aiardle wie cielane< wens 


142 


TEST DE COMPOSITION ECRITE 


TABLE A2.1 (Continued) 


NOTE: Dans vos réponses niveau 
aux questions 30 4 35, avancé 


vous ne vous référerez 
pas en particulier 4a 
l'essai proposé. 


30. 


Sh. 


Quelle importance 
accordez-vous a4 un 
exemple de la composition 


de 


évaluation de la compétence 
au niveau de la langue? 


(1) 
(2) 
(3) 
(4) 


Comment considérez-—vous 
Mupitisationsartbetfois 
d'un test de connaissance 


de 


a choix multiple et un 
exemple de composition, dans 
1'évaluation de la compétence 
au niveau de la langue? 


(1) 


(2) 
(3) 


(4) 


1'étudiant dans une 


importance trés marquée... 
importance marquée........ 
importance modérée........ 
importance nul le ds i506 se% 06 


la langue avec questions 


la formule de questions 

a choix multiple est 
satisfaisante en 

Sls O-MEMe ASW « Fake os os oe bie 
le recours aux deux 
formules est important.... 
la formule d'un exemple 
de composition est 
satisfaisante en 

CHENG = GNVGs va tote cra. odes eis 62s sie ete 
aucune des deux 

formules n'est 

Sates hat Samtemterme sis o ensis \s ore 
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12éme année 


niveau niveau 
général | de base 


TABLE A2,1 (Continued) 


12éme année 


niveau | niveau niveau 
avancé | général | de base 


Si, par rapport 4 un 
ensemble, les résultats 
d'un étudiant obtenus 

au test de composition 
écrite étaient différents 
de ceux obtenus au test 

a choix multiple (du type 
de ceux que l'on fait 
passer dans cette étude), 
quels sont ceux que vous 
considéreriez les plus 
valables en tant que 
mesure de la compétence 
d'un étudiant au niveau 
de la connaissance de la 
langue? 


(1) les résultats obtenus 

a la composition écrite.., 
(2) les résultats obtenus 

au test a choix 

TN Tak: Lip ere Aaa tea cteteyel vine tes obsne 
(3) des résultats qui 

combineraient les 

deux avec l'accent mis 

sur la composition 

CCT BOR, ea ioResel. Poletep smote: «cherie se 
(4) des résultats qui 

combineraient les deux 

avec l'accent mis sur 

le test a choix multiple.. 
(5) des résultats qui 

combineraient les deux 

de fagon équivalente...... 
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TABLE A2.1 (Continued) 


12éme année 


niveau | niveau niveau 
avancé | général | de base 


33. Combien d'étudiants devraient 
pouvoir é€crire un essai correct 
- du type présenté ici - 4 son 
entrée dans les cours de 
frangais au niveau donné? 


Chi estousi faa. Wass Pat ci a css ale 
(2) plus de 75%, mais pas 

PAO US eeebala) atel Gialksvel sbelslisie 6 evel ei seks 
(3 image: Dik Ay PO wiapeyecee.< os 6 5 ojo 
Keo? Ciera DOVE wiolisvek) exe ales ost 
(5) un ou plus, mais moins 

US. DUGVER teh cup g GH DOI RING © 
CO PALO Use eect teas, sietpis tesla srave lest 9) obs 


34. Quelle attention accordez-vous, 
dans votre enseignement, au 
développement de la compétence 
des €étudiants dans ce type de 
composition écrite? 


(1) attention trés marquée.... 
(2) attention} merquée,;......1. 
(3)) sattention,moderée. «221.4. 
(Ce attention: nulla she «5 2s ate 


35. Combien d'étudiants qui 
terminent avec succés les 
cours de frangais a ce 
niveau devraient pouvoir 
écrire un essai correct 
du type de ceux proposés dans 
ce test? 


(Ee SOUS ete ete cece side sis ea oe 
(2) plus de 75%, mais pas 

OMS Iovenettercueitel cltene rel cies el elsie! one she 
(3) reo Lena! Wa sf sols = sie ok ws 
ye mca CO Ree Ome wetaetelae «ae 
(5) un ou plus, mais moins 

GL Ge a pian, o> cans Gal at oh ab sl ats 
OSS Pegdee AU ved ts a Weare near tei aCy Canaan arin 
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30". 


NOTE: Répondez aux questions 
37 a 40 en tenant compte de 
la composition écrite proposée. 


BT. 


28. 


ao). 


TABLE A2.1 (Continued) 


12éme année 


niveau | niveau 
général | de base 


avancé 


Y a-t-il d'autres modes de 
composition @écrite qui, du 
point de vue d'une culture 
générale, seraient aussi ou 
plus importants que celui 
utilisé ici? 


La composition écrite 
était-elle située a un 
niveau raisonnable de 
difficulté pour des 
étudiants suivant les cours 
de ce niveau? 


Le fait de n'avoir proposé 
qu'un seul type de composition 
écrite était-il de nature 

a défavoriser les étudiants? 


Compte-tenu de fait de n'avoir 
proposé qu'un seul type de 
composition écrite, que pensez- 
vous de la variété des sujets? 


(1) bonne 
(2) ‘satisfaisante 
(3) won saticfiaisante.. 3 ..c.b. 
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40. 


TABLE A2.1 (Continued) 


Numérotez les critéres 
suivants del a4 5, 

par ordre d'importance 
décroissante, pour 
1'évaluation de ce type 
de composition écrite. 
Inscrivez les chiffres 
dans les cing espaces 
prévus 4 cet effet dans 
chacune des 4 caqlonnes. 


Structure générale 


Présentation logique 
des arguments 


Style, principalement 
dans la phrase 


Technique de la langue: 
grammaire, usage, mécanique 


Choix des mots 


Oot © So ie Oo & ot = Om & w.nw oF whnrd 


Cn coh) 
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13éme 
année 


12éme année 


niveau |} niveau niveau 
avancé | général | de base 


“SITQMSUE JZOIITIOO 


EoNa opel) pureise [aio UW 1OE “AG B(STeOUel sr anbue, 


ep 388] 94} ut 


eT 9p aoueSSTeuUuod ap ya ainqyaaT usa uoTsSUsYysidwos 


SW3}] O03 SIJamMSUY }JI8IIOjQ Jo Ssabeqjuaoiagd 


Coe ANE 


OM} BABY OQ pabpnl SW9}T IO4 pajyiodail ST Aaa bob EG 89}JEWTISA ONE 
ae Pe 92 81 02 
a Z¢ al 61 61 
61 BI 02 Se ee: ay 9¢ 6 a1 BI 
£6 zy Wy ng UR AW 92 02 Le 9¢ ie 
Sg i Boe ge eae 62 82 TZ a1 91 
£6 T¢ 02 7T G¢ TS B¢ 79 76 cI 
TY 92 O¢ 91 06 O¢ 61 ae ec a1 
TS 76 0s Ly Eg 9¢ GZ £6 GZ a 
81 92 O¢ Ss Z¢ BZ 02 ny T¢ “a 
91 LS Ud ZT T¢ ZY 06 8¢ G2 i 
Ks 94 Z¢ 61 Os TS a a cy Ot 
vL £9 fet L 62 89 BS BS LS 6 
99 0S 76 9¢ BZ 92 02 92 91 8 
81 92 61 OT LZ 66 28 L9 0S L 
LT Za 09 a 92 69 Le GL £6 9 
Gy Lé L9 Go G2 £8 09 aL Gil 6 
£¢ 62 21 96 2 L9 ing OL iG 9 
c¢ S¢ 02 61 £2 £8 aul 89 Ls ¢ 
69 £9 OL 19 Be BL G9 L9 lag Z 
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TABLE AZ 


Frequency Tabulations of Proportions Reported 
in ‘LabwereaZ .:2 


Form 1 Pee) 7 
Percentage SSGD SSHGD SSGD SSHGD 


90-99 
80-89 
70-79 
60-69 
90-59 
40-49 
30-39 
20-29 
10-19 
00-09 


py 
1 NWO) SaliWw NT sA>S Na 


4 
Serer ee OrFF 
Pe EF FYUHNNE NE 


me WON ON W & ON U7 I 


= 

WG 
- 
es 
£_ 
WG 
f= 
WN 
- 


*paka»x 
aTqnop sem 31 asnesaq stskTeue STYyy wOIJ PapnN{~axa sem WIOJ YDS UT WAaxYT BUD YNQ BaTay pajyiodai 
JIaqunu ay} ueYyy aiow auo SseM 4Sa} 34} JO wWIOJ YO!DS UT adXy  styzy jO swaztT JO Jaqunu ey), 


SS ee 


as LG 7T hy 6 y 9T (aa Gn GZ Tat Go edAt 
iT Ov 9T GG G : el hy 9T ZS G Z~. adKt 
OT Iv 6 7¢ 9 : 6T 7¢ EF 8Z jh pl adh} 


quawarnatyoy abenbueq 


CZ 6S Le 61 vl) & Gat 19 ed. chy) i uotsuayaiduoj Butpeay 
2asS ueayW Sues UBS SWS) J °"a’s UBSW\ alien UB SW SWI] adky WIV] 
G9HSS agss jo ‘ON G9HSS qyss jo *ON 
Gs wWwto 4 IE wio 4 


(STeDUR Id) SON bUE TA eT 
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TABLE pAZ 5 


Percentage of Students at Each of Two Grade Levels 
Not Reaching the Items Appearing Towards the End 


bi iach) Mormeof the lest <deseonpreheroi om en 
lecture et de connaissance de la langue 
(frangais) 


Item Poin I r@rin # 
_No SSGD SSHGD SSiGW) SSHGD 


20 0 
Za 1 
De i 
i 1 
24 il 
22 2 
Z6 3 
ae | 3 
28 5 
Ze? 6 
30 8 
Sy as 
by ity 
be, Ze 
34 Zh 
bes, ois 


RA XO) SS SOA Ione OS 1 OS 1 I 


WNN Hee 
FUOOWFOURNFOTDOOO 
lan 

(ee) 

faa 


“This item of Form 2 was judged to be double-keyed 
and consequently was omitted from the item 
analysis. 


t ii ¢ I O¢ 6 eT 7 G Aa 
¢ T 17 é 62 9 Tet fiat 61 eat 
Z I ZT L BZ 7 6 alk 71 OT 
T ¢ I I ag ¢ 9 9T 7 T 6 
or Cal. is ¢ 9Z °4 ¢ ai 8 8 
OT 6 T T Ge 0 Z 7 L L 
rie ri 7T 9T 7Z i Z ¢ G 9 
OT 6 al GT ae ) a Z Z G 
8 G 8 ee Bes T I T t7 v7) 
8 Z ae 71 eZ T i ig ¢ e 
qa q7 6 Tt wihis 0 0 ic i Z 
a ¢ L 7 61 I 0 Z ¢ T 
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*paka»y 
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(panut4zuoo) 9°*Zy 41aVvl 


LD 


TABU Be ¥8 26. 7 


LacdicAas Oy Wiser aniline tom FOr NvemMs wm BoOrcin rors oy 


the Lest der caomprenens.one en lecture et de 
connaissance de\ la langue (francais) 


Porn Il Form 25 
Item No (N=248)° (N=249) 
l 54 45 
2 47 62 
3 51 53 
4 43 54 
5 i 75 
6 50 35 
7 45 48 
8 42 20 
9 32 37 
10 53 62 
i 51 39 
12 40 52 
ibs 39 30 
14 oe 49 
15 4a 49 
NG 57 As 
7 55 53 
18 Da! 46 
19 aA 35 
20 45 : 
a 60 48 
DD 37 Ag 
Dg) 38 32 
24 a7 ie 
25 70 53 
26 52 41 
nal aT 58 
28 15 55 
29 64 31 
30 51 42 


TABLE A2.7 (Ceontinued) 


Form 1. Form ae 

Item No (N=248) i (N=249) 

Syl DP 80 

by 40 TS 

oy} 36 6 3 

3p th 5D 49 

35 60 66 
Mean A 48 
SrDis 2 14 


Note: Decimal points have been omitted from indices. 


“The students in this group wroterbath | orms “ot “the test “in 
the, order Porm, ]—-sForm Z2Zsconly the results for Form 1 are 
reported here. 

The students in this group wrote both forms of the test in 
phe order Porn 2--Formn Ls only the results fer rrorm 2" are 
reported here. 


“These items were judged .to be double-keyéd and were 
therefore omitted from subsequent analysis. 
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TABISE SA 3 


some Statistics Describing Both Forms of the 


Test de compréhension en lecture et de 
connaissance de la langue (francais) 


orn | Form 2 
Number of Examinees Dae 2 249° 
Number of Test Items 34 Bll 
Mean D7 HSA WS) 
Standard Deviation Grea: Tate 
Highest Score 29.00. 29.00. 
Lowest Scor Ce) aay. ae 
Relaabada ty d OSG 0.80 
Standard Error of Measurement Sie 010) eet bal 


“These praminecs, Stookey both. forms: sofisthe test. bintet ne 
Order Form l==Forn 2. 


Deeper: examinees took both forms of the test iia) clay 
order, Form 2==Ropm je 


€ ; 

Negative scores arise as a Veonsequence of sehe 
application of a correction-for-quessing;: in this case, 
each wrong answer was scored (-1/4) and each correct 
answer was scored (+1). 


omptced by applying a formula due to Hoyt (1941). 


TABLE ER AZ 27 


Correlations Among Scores on the Subtests and Total 


jest or Dorh. Porns- of the Lest devconpréhension en 
lecture et de connaissance de la langue (francais) 


best Part a4 Z, » 4 
1. Reading Comprehension 
subtest - 4] ey) 84 
2.°First Language Achievement 
Subtest 38 - 49 76 
3. Second Language Achievement 
Subtest 48 D2 - 83 


“a \potealk (wei 80 US 82 = 


Note: Decimal points have been omitted. Correlations below 
the diagonal are for Form 1, those above the diagonal are 
for’fonmm 2. For correhations’ be Low therdiagonal, the sample 
Sizer wise. 2tas ) for correlations above: the diagonal, ttre 
sample size was 249, 


Lod 


*Butsstw sem azJ09S 34} YoOTYUM 
IojJ uOTseo90 3Yy} pue JaxIeW ayy SAeSSa |ayy IOJ S}YIAJJa UTeW |3uA 
pue uesw puei6b 93ay4} Jo wns ay} Se pazyewtysa sem aaoos Bbutsstw 
Yds. QIUELICAS JORST SA, CUR, aug UT “peio0Sun, pauanzed «sem —-Aessa 
ue ATTeuoTsed90--JOIIa dayIeW jo asnedaq Jo--JayJewW e 04 jUAaS 
a) PUNnG SUSU BPI nT IU Wow sem AeSSsS” (Ue “ATT euoOtSe0D0==10113 
S@ATJEIYSTUuUTWpe JO asnedsaq Jayyta ‘butTsstw aiam ggg Jo 3aS Te 40} 
BUAS UP essIOsc xXTSen Teqeung aon fp) Gey 10) [esos & BuTyeu * 967 
8q p[noys Jaqunu styy ‘suotseaso0 Z pue sidaxiew ¢ ‘skessa ge ea 


°O1az se payeadiy AT}UaNbasqns ‘iIaqunu antjyebau e sem STU], 


O20 UUs qglél Tenptsay 
90°O 06°¢ 17) SUOTSB99Q xX SJayIeW 
COB Diora 61 SUCTSe399 xX sAeSsq 
67° 0 eat 96T SdayJew x sAessy 
eB FDO T SUG SE Jai} 
Eee G6°SZ 17 Siay71eyW 
ea oer 64 skess 3 
qyuauodwoj adueTIe) aienbs wopaad 4 aQUeTIEA 
jO azeutysy Ue da\\ jo saaibag jO a9ano0s 


SUC SE di) 40M) ome | 
UO SJ9yxIeW BATY AQ |3RTIID9 UOTYTSOdWOD ap Ssj}Sa}] ayy JO Q¢ oj 
paubtssy saio09g jo aauetTie, jo stskTeuy ayq jo ArTewwns 


Oi Nes a 


J 


APPENDIX A3 


TECHNICAL REPORT ON THE TESTS OF FRENCH AS A SECOND LANGUAGE 


This set of four tests was designed to see how well students -can 
read, understand, write and speak French. The tests were originally 
prepared for use in the IEA world-wide survey of achievement in 
French (Carroll, 1975) and were adapted slightly for use in Project 
Il. The tests were administered to a selected group of those students 
taking one or more Grade Thirteen courses in French. All students 
chosen for testing were assigned the Reading Test and all but at 
most four students per school were assigned the Listeninges lest. 
Those students in each school who were not assigned the Listening 
Jest were assigned either the Writing Test or the Speaking Test, but 
not both. 


This report consists of three main sections: a description of 
the content of each test; a summary of the test appraisals done by a 
group of secondary school teachers of Grade Thirteen French courses 
and a group of university instructors of first year French courses; 
and a presentation of information about some technical matters 


related to the tests. 


eaDESOCRIP TION: OF TEST CONTENT 


Lp Readingaslest 


This was a test of ability to understand written French. Students 
were. allowed,» 3i0, .minukes.to,*complete..a. total»of\ 3.9 


multiple-choice questions. The test was divided into two parts. 
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The first part consisted of 17 incomplete sentences. Students 
were to choose the answer that best completed each sentence from 
among the four choices given. In the second part of the test, 
there were six reading passages ranging in length from 
approximately 60 words to approximately 150 words. Each passage 
was accompanied by three or four or five questions or incomplete 
statements. On the basis of the material contained in the 
passage, students were to choose the best answer from among the 


four given for each item. 


lez Listening lest 


This was a test of ability to understand spoken French. It took 
about 25 minutes. A total of 34 multiple-choice items were 
presented on tape. Answers to each question were chosen from the 
four possibilities printed in the test book. The Listening Test 
had five parts. In the first part, students heard a series of 
statements. Each statement described one of four pictures labelled 
A through D in the test book. Students were to choose the picture 
that best fit the statement. Part Two consisted of a series of 
remarks or questions spoken by the tape-recorded voice. After 
each remark or question, students selected the most appropriate 
response from among the four offered in the test book. In the 
third part of the test, students listened to a series of short 
conversations. (One person asked a question or made a statement, 
and another replied.) After each conversation, students were to 
select, from among the four statements printed in their test 
books, the one that was correct according to the conversation. 
The fourth part of the Listening Test consisted of a series of 
short broadcasts on announcements, - each” followed =by a 
question--still on tape--about what had been said. The best 
answer to the question was to be chosen from among four choices 
primbed in “the test book.. Finally, ©in» Part” Paiveusot theotest. 
students heard relatively long dramatic scenes or conversations. 
Each scene or conversation was repeated and then several questions 


were asked about it. From among the four choices given in the 


160 


test book, students were to select the best answer to each 


question. 


The number of items in each of Parts One to Five of the 


Listening Test was 7, 9, 8, 5 and 5 respectively. 


ef. Jer seang? fest 


This test of ability to write, French was divided into two 
separately-timed parts. Fifteen minutes were allowed for the 
first part and ten minutes for the second. Part One of the test 
contained two types of item. The first type presented a sentence 
in which one word had been replaced by a blank space. Students 
were to complete the sentence by writing in the blank space a 
single French word, correct in both form and meaning. There were 
26 items of this type. The second type of item consisted of a 
pair of sentences. The first sentence had two or three words 
underlined; the second sentence had a corresponding number of 
blank spaces. Students were to use the underlined words in the 
first sentence, making whatever changes in form were required, to 
fill the blank spaces in the second sentence. in the test there 
were six items of this second type. In Part Two of the Writing 
Test, students were asked to write a short composition, 
approximately one-half page in length, on a given theme. Various 
aspects of the theme were listed, and students were asked to 
cover all of them in the order in which they were listed, and to 


omit none. 


1.4 Speaking Test 


This test of ability to speak French contained four parts and 
required about 25 minutes to administer to an individual student. 
All responses were spoken into a microphone and recorded on tape. 
In the first part of the test the student heard a_ series of 
French sentences spoken twice. After the second presentation of 


each sentence, the student was to repeat it. The second part of 


Gy 


the test consisted of a set of pictures and a question about 
each. The student was to reply to each question in a complete 
sentence, and not in a single word. The third part of the test 
presented the student with a prose passage of approximately 150 
words. He was given three minutes to study the passage and then 
asked to read it aloud. The last part of the test consisted of 
two..exercises. In the first exercise, the. student chose, a 
sequence of three pictures from among three such sequences. Once 
his choice had been made, he was given one minute to prepare, and 
then 30 seconds to present, an oral description of the events in 
the picture sequence. In the second exercise, the student was 
asked to choose one of three pictures and then to describe the 
following: the probable action that took place before the events 
depicted, the situation actually depicted, and the _ probable 
outcome of the situation. Again, the student had one minute for 


preparation and 30 seconds for oral presentation. 


TEST WAPPRAISAL 


The Tests of French as_a Second Language were appraised by 64 
secondary school teachers and 18 university instructors. Tlo 


qualify as an appraiser, a secondary school teacher had to be 
teaching a Grade Thirteen French course in one. of, the, 53 
Anglophone schools in the study. Teachers from only 46 of the 
schools did the appraisals. University instructors qualified as 
appraisers if they taught courses in French to first year students 
ifve oney Ofae the universities. chosen) form. inc lusaongeao 
Secondary-Postsecondary Interface Project  III--The Nature _ of 
Programs. Because no effort was made to draw a probability sample 
of university instructors of French and because it is not known 
how many secondary school teachers refused to appraise the tests, 
it is not possible to use the opinions of the individuals who did 
serve as appraisers as a basis for inferring what the population 
of instructors across the province would say about the tests. 


These results are presented only as the collective opinions of a 


group of 82 individuals qualified by virtue of their professions 


to serve as appraisers of these tests. 


Z.1 Reading Test 


As their first task, appraisers were asked for judgments on each 
of the 17 items in Part A of the Reading Test. Secondary school 
appraisers were asked to judge whether the reading knowledge of 
French required to answer each of, these multiple-choice questions 
was "old knowledge" that students should have on entry to a Grade 
Thirteen course, “new knowledge" that should be learned by all or 
some specified fraction of students taking a Grade Thirteen 
course, or "new knowledge" that no student in a Grade Thirteen 
course could be expected to acquire. University instructors were 
asked to make similar judgments with respect to students in first 
year university French courses. A tabulation of the responses 
given by secondary and postsecondary appraisers is contained in 
Tapie> AD sb. 


It is clear from Table A3.1 that for most items in Part A 
of the Reading Test there was a measure of disagreement amongst 
the appraisers as to whether or not an item measured knowledge 
that all or most students could be expected to have at the end of 
secondary school, just prior to entry into university. The only 
items which at least 75 per cent of the secondary school 
appraisers judged that 75 per cent or more students should have 
the knowledge to answer correctly were items 1 through 7, 
inclusive, Vandi ertems 12 “and) 132..0f these, \onlynitems d.,through 
5, inclusive, and item 12 were judged by at least 75 per cent of 
postsecondary appraisers to tap knowledge that students should 
have on entry to university courses. Viewing Part A of the 
Reading Test in this way leads to the conclusion that only 
one-half or fewer of the items in this part of the test measured 
knowledge that most appraisers felt most students should have at 


the interface between secondary and postsecondary education. 


163 


The low opinion that the appraisers had of Part A is 
further confirmed by the tabulation of responses to a further 
question: "Do you consider that Part A of the Reading Test, 
taken as a whole, assesses an important component of French 
language achievement at the Grade Thirteen level?" Of the 61 
secondary school appraisers who responded to this question, only 
33 responded yes. Thirteen of the 18 university appraisers gave 
an affirmative answer. Among the reasons offered for responding 
negatively to this question were the following: some items tested 
rarely used forms which need not be stressed, some items were too 
easy, the range of material covered was too limited, and some 


items were *boon literary!) 


In-ethe! Wappraisalmot Parts tof thetkeadingmhkest, respondents 
were asked to assess the difficulty of the six reading passages 
contained in this part of the test. Results based on this 
assessment are given in Table A3.2. The secondary’ school 
appraisers judged passages 1, 2. and 4 to be on the easy side and 
passages 5 and 6 to be on the difficult side. Generally speaking, 
this assessment was confirmed by the university appraisers. In 
the assessment of appropriateness, discounting difficulty, only 
the fifth passage was labelled inappropriate by a _ substantial 
number of appraisers. This passage was described by several of 
the appraisers who offered written comments as uninteresting and 
confusing, even to those who know and use French as their first 


language. 


Appraisers were then asked about the kinds of knowledge 
tapped by the multiple-choice questions based on the reading 
passages. Roughly speaking, the questions were designed to 


assess: 


(a) the ability to understand the literal meaning of a 


passage, 


(b) the ability to identify the main idea or purpose of a 


passage, and 
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(c) the ability to draw inferences or implications from 


what is written in a passage. 


Respondents estimated the proportion of students who should 
have these capabilities, both on entry to, and on_ successful 
completion of, courses at the Grade Thirteen and first year 
university levels. A tabulation of responses to these questions is 
contained in Table A3.3. The results that bear on the interface 
are the estimates from secondary school appraisers for students 
who have successfully completed Grade Thirteen courses and those 
from university appraisers for ‘students entering first year 
university courses. Fewer than 75 per cent of the _ secondary 
appraisers expected 75 per cent or more of students at the 
interface--that is, students who had successfully completed a 
Grade Thirteen course in fFrench--to be able to draw inferences 
from, and see implications in, the ideas they encountered in 
French passages like those in the Reading Test. But /5 per cent 
or more of these appraisers did expect 75 per cent or more of 
students at the interface to be able to read for literal 
understanding and to possess the ability to identify the main 
purpose or idea of a passage. The opinion university appraisers 
held of the abilities of students at the interface was somewhat 
lower. The only skill that at least 75 per cent of the university 
appraisers indicated was possessed by 75 per cent or more of 
students entering first year university courses in French was 


reading for literal understanding. 


Appraisers were asked to rate the degree of emphasis they 
gave to the aforementioned three reading skills in their Grade 
Thirteen and first year university French courses. The responses 
are tabulated in Table A3.4. Regardless of level, a _ large 
majority of appraisers indicated that they gave all three skills 
some degree of emphasis. This fact supports the inclusion in the 


Reading Test of items that attempted to assess these skills. 
The final three questions in the Reading Test appraisal 


concerned the reading skills not assessed by the test, the 


usefulness of the multiple-choice format for assessing the skills 
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tapped by the Reading Test, and whether or not the kind of test 
represented by the Reading Test was used by appraisers in their 
Grade Thirteen and first year university courses. A tabulation of 
responses to these questions is given in Table A3.5. Although 
the majority of secondary school and university appraisers 
indicated that there were no important reading skills that were 
not assessed by the Reading Test, approximately one-fourth of the 
appraisers did feel there were important omissions. Two skills 
these individuals felt were not being assessed were the ability to 
make grammatical distinctions and the ability to detect 
differences in style and tone of writing. Even the abilities to 
identify main ideas and draw inferences were mentioned because 
several appraisers felt these skills were not well assessed by the 


test despite the fact that it was intended to measure them. 


On the question of whether or not the multiple-choice 
format: was a reasonable way to assess at least those reading 
skills tapped by the Reading Test, only four appraisers expressed 
a negative view. Of the vast majority who said YES, in response 
to this question, somewhat more than half qualified their answer. 
In their written comments, a number of appraisers said that 
multiple-choice questions should be supplemented by questions 
demanding an answer composed by the student. Other commentators 
mentioned the need for supplementary oral testing to clarify a 
student's multiple-choice responses. And several appraisers raised 
the issue of the possible influence of guessing on multiple-choice 


test scores. 


The last question for which a tabulation appears in Table 
A3.5 concerned the use by the appraisers of tests similar to the 
Reading Test. The number of secondary school appraisers who 
reported using this kind of test, at least occasionally, was 
about twice the number who reported not using it at all. The 
majority of university appraisers indicated that they did not use 


tests of this nature. 
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The picture provided by the appraisals of the Reading Test 
can be outlined roughly as follows: 


(a) Part A of the test, consisting of incomplete sentences 
and four alternative completions, was viewed quite 
negatively. Only one-half or fewer of the items in 
this section were judged by most appraisers to be 
suitable for assessing some aspect of the reading 


competence of students at the interface. 


(b) Part B of the test, consisting of reading passages and 
related questions, was viewed much more favorably. The 
reading abilities assessed in this part of the test 
were ones that students were judged to possess in 
varying degrees at the interface and that instructors 
said they emphasized to a greater or lesser extent in 
their courses. The reading passages themselves were 
generally seen as satisfactory as far as difficulty was 
concerned, although various complaints were _ lodged 


against one passage or another. 


(c) The multiple-choice question format employed in the 
Reading Test was regarded as acceptable by all but a 


few of the appraisers. 


Le Seen ing “hese 


Appraisers were asked whether students should have acquired the 
knowledge of French needed to answer each item in the Listening 
Test either before entering or during a Grade Thirteen or first 
year university course. Knowledge students would have on entry to 
a course was to be classified as "old"; knowledge that would be 
learned during a course was to be classified as "new". For each 
item testing new knowledge, secondary school appraisers were 
asked to estimate the percentage of students who would master the 
new material during the Grade Thirteen year. A tabulation of 


responses is given in Table A3.6. Those responses reflecting a 


judgment that 75 per cent or more of students at the interface 
should have the knowledge required to answer an item lie to the 
left of the dotted lines in Table A3.6; one line for responses 
in the secondary school appraisals, the other for responses in the 
university appraisals. For 26 of the 34 items, 75 per cent or 
more of the secondary school appraisers indicated that 75 per 
cent or more of the students at the interface should have the 
knowledge required to answer correctly. Applying the same 
criteria to the responses of university appraisers leads to the 
identification of 23 items (all but one of which are in the set 
of 26 items identified in the analysis of responses’ from 
secondary school appraisers) testing knowledge which should be 


"old" by the time a first year university French course is begun. 


It will be recalled from the description of the Listening 
Test given in the first part of this technical report that the 
test was divided into five parts. The line separations in Table 
A3.6 divide the items as they were divided among the five parts 
of nthe? testy i Part Poureis Mthe wonlye part oteithertest for whichea 
majority of the items (3 of 5) were judged by less than 75 per 
cent of both the secondary school appraisers and the university 
appraisers to require knowledge most students at the interface 


should have. 


Additional reactions to the Listening Test were obtained 
through a series of four relatively general questions. These 
questions and tabulations of responses to them are given in Table 
A3.7. It as apparent from the» results for the’ first “ofesthece 
questions that a majority of the secondary school instructors who 
appraised the tests and all but one of the university instructors 
placed heavy or moderately heavy emphasis on the development of 
listening skills in their French courses. Responses to the second 
general question indicate that a substantial majority of both the 
secondary school and university instructors could think of no 
important listening skills other than those included in the test. 
Those respondents who did think that important skills’ were 
omitted went on to mention, in written comments, such things as 


the ability to adjust one's listening to the speed of a native 


speaker, the ability to discriminate various vowels and 
consonants, and the ability to identify the sound of a word in a 


group of similar-sounding words. 


In response to the question about the use of a tape-recorded 
script and multiple-choice questions in testing for achievement of 
listening skills, all but one respondent indicated that this kind 
of test was acceptable, although a number of respondents 
qualified their response. Several appraisers said that use of a 
"mechanical" recorded voice creates an artificial testing 
situation compared with use of a "live" reader. None of these 
comments showed an awareness of the need in survey testing to 
provide the kind of standardization of administrative conditions 
that use of a tape recorder provides but use of "live" readers 


does not. 


On the question about using tests similar to the Listening 
Test in their own courses, the majority of respondents indicated 
that they made some use of such tests. Those who said they made 
no use of this kind of test indicated that they did not have 
access to suitable test tapes or that they did not approve of 


multiple-choice questions. 


The results of the appraisal of the Listening Test suggest 


the following conclusions: 


(a) Approximately three-fourths of the items in the test 
were ones that most students at the interface should 


have been able to answer correctly. 


(6) Sihey iestenrangreskithls assessed bye the) test pare 
important; most of the appraisers strove to teach these 
skills to some degree in their own courses. Moreover, 
most instructors were of the opinion that no important 


listening skills were neglected in the test. 
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(c) Reasonable procedures were employed in the 
test--tape-recorded voice and multiple-choice 


questions--to achieve standardized testing conditions. 


Zoe Were ings Test 


This test was composed of two  separately-timed sections. 
Questions in the appraisal inventories dealt with each section of 


the test separately and the results are reported separately here. 


Part One of the Writing Test consisted of 32 sentence 
completion items (see Section 1.3 of this report). Appraisers 
were asked to judge each item in terms of whether or not the 
knowledge required to complete the statement was knowledge that 
students in Grade Thirteen or first year university courses could 
be expected to have, and if so, whether they would have acquired 
the knowledge before entering the course, in which case it would 
be "old" knowledge, or while in the course, in which case it 
would be "new" knowledge. Tabulations of the judgments of the 
two groups of appraisers are given in Table A3.8. Note that a 
dotted line in the tabulation for each group of respondents 
separates those responses reflecting a judgment that 75 per cent 
or more of students at the interface should know enough French to 
answer a test question correctly from responses reflecting a 
judgnent that fewer than 75 per cent of students at the interface 
should have the requisite knowledge to answer a_é question 


correctly: 


The results in Table A3.8 clearly indicate that the large 
majority of both secondary school and university level appraisers 
thought that students at the interface should know enough French 
to answer the questions in Part One of the Writing Test. Only 
four of the 32 items received the endorsement of fewer than 143 
university appraisers as items that students entering first year 
courses should be able to answer. (The number 13 was used in 
this analysis because it represents approximately 75 per cent of 


18, the number of university appraisers.) All 32 items were 
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judged by at least 75 per cent of the secondary school appraisers 
to tap knowledge that 75 per cent or more of students at the 
interface should have. In view of these figures, it is tempting 
to conclude that the questions in Part One of the Writing Test 
were highly appropriate for students at the interface. What is 
disturbing, however, is the fact that most of the questions 
tapped what most secondary school appraisers classified as "old" 
knowledge. There were only a few items that even as many as 25 
per cent of the secondary school appraisers judged to be testing 
knowledge that would be acquired. in Grade Thirteen. From this 
point of view, Part One of the Writing Test could be criticized 


for focussing on inappropriately elementary knowledge. 


For results on the appraisal of Part Two of the Writing 
Test--that part in which students were asked to write a 
composition--reference is made to Tables A3.9, A3.10 = and 
A3.11. Table A3.9 is based on responses to the request for an 
estimate of how many students entering and successfully completing 
French courses in Grade Thirteen and first year university should 
be able to write an acceptable composition of the type found in 
the Writing Test. More than two-thirds of the secondary school 
and university appraisers were of the opinion thet 75 per cent or 
more of students at the interface should be able to write an 


acceptable composition of this type. 


In response to a question about the degree of emphasis given 
to the development of student competence in writing French 
compositions (see Table A3.10), the large majority of Grade 
Thirteen and university instructors reported the expenditure of 
some effort in this direction, and more than half the individuals 
in each group said they gave this objective either heavy or 
moderately heavy emphasis. The ability to write compositions in 
French is clearly an important skill in the opinion of these 


appraisers. 
One question in the appraisal asked whether or not there 


were types of writing other than the kind assessed in the Writing 


Test that students in Grade Thirteen and first year university 


PP) 


courses should be able to do (see Table A3.11, Question 1). Most 
appraisers said that there were. The following additional kinds 
of writing were suggested: translations, formal essays in which 
literary evaluations are made, resumés, stories, character 
sketches, answers to questions about textual material read by the 


students, creative essays, letters. 


Appraisers were then asked for an assessment of the 
difficulty of the composition assigned in Part Two of the Writing 
Test. Most of them agreed that it was at a reasonable level of 
difficulty (see Table “A311, “Question 2);* although there %were 
several comments to the effect that the assignment was too easy 


or rtoo Guvenisie: 


On cthe enatGerciot tthetcrimt craartusicdy toh iypidgemetite 
compositions that students wrote (see Table A3.11, Question 3), 
the majority of appraisers felt that there were other criteria of 
importance in addition to total number of clauses, grammar, 
extent of vocabulary and accuracy in the use of vocabulary 
(excluding spelling and use of accents). The other criteria that 
appraisers suggested included style, clarity, maturity of ideas, 
originality, creativity, organization and logic, coherence, 
ability to interest the reader, flair, spelling and accents, 


variety of sentence structure, ease of expression. 


The appraisals of Part Two of the Writing Test suggest the 


following conclusions: 


(a) The” <abivbrty * to**write a teonipssationy tint Prencn ads 
important. Moreover, “itis ‘a iskillthate most? of the 


appraisers tried to foster in their students. 
(b) The writing task that was assigned was at a reasonable 


level ‘of “difficulty, af slightly om the “easy ‘srdey for 


students at the interface. 
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(c) There are types of writing other than the one included 
in the test that students should be able to do and 
there are important criteria for judging compositions 


in addition to the ones used with the test. 


Appraisers gave a summary evaluation of the Writing Test 
(see Table A3.11, Question 4) in terms of whether or not they 
used similar instruments in their own work. Most agreed that they 


did, at least occasionally. 


2.4 Speaking Test 


This instrument was divided into four parts. In the first part, 
students were asked to repeat a series of French phrases that 
were spoken to them by a tape-recorded voice. A tabulation of the 
responses of appraisers to the items in Part One is given in 
Table A3.12. It is clear from this tabulation that most students 
at the interface could be expected to have the knowledge needed 
to succeed on this’ part of the test. In fact, most of the 
secondary school appraisers judged the content of this part of the 


test to be focussed on material learned prior to Grade Thirteen. 


A summary of the assessment of Part Two of the test is 
given in Table A3.13. Items in this part of the test consisted 
of cartoons and the student's task was to answer in French a 
question asked in French about each cartoon. It is apparent from 
Table A3.13 that the items in this part: jofithestest, focussed on 
knowledge that students at the interface should have. In fact, 
the opinion of most of the secondary school appraisers was that 
this knowledge would have been learned in pre-Grade Thirteen 


French courses. 


In Part Three of the test the student read a prose passage 
aloud after having a minute to study its contents. Table A3.14 
is a summary report of appraisers' estimates of the percentage of 
students who could give an acceptable reading of this passage upon 


entry to and upon successful completion of a Grade Thirteen or a 
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first year university course in French. Seventy-five per cent of 
the secondary school appraisers felt that 75 per cent or more of 
students at the interface, that is, students who had successfully 
completed a Grade Thirteen course, should be able to read the 
passage in acceptable fashion. Only half the university appraisers 
held the same opinion of the abilities of students entering first 


year university courses. 


Other’ opinions: *abolt Part’ Three sof (thes Speakingm lesumwere 
solicited through three additional questions (see Table A3.15). 
The first of these concerned the estimated difficulty of the 
reading passage. A majority of the appraisers, teaching either 
Grade Thirteen or first year university courses, indicated that 
the level of difficulty was about right for students at the 
interface. A second question asked whether the skills needed to 
read the passage were taught in Grade Thirteen or in first year 
university. Responses, summarized in Table A3.15, indicate that 
this type of skill did receive some degree of emphasis in the 
courses taught by most of the appraisers. Finally, the criteria 
for scoring a rendition of the passage were presented for 
consideration (see Question 3, Table A3.15). Most of the 
secondary school instructors felt that other criteria of equal or 
greater importance should be used in scoring. On the other hand, 
most of the university appraisers said that phrasing = and 
pronunciation were adequate in themselves. Those appraisers who 
felt that other criteria should be used, had the following 
suggestions: intonation, dramatization or expression,’ speed, 


stress and use of liaison. 


Part, Pour Jof they Speaking “Lest:iconsisted) ofy twopexercises 
in fluency. In one case the student was to tell a story about the 
events depicted in a series of cartoons; in the other case, the 
student was to describe what he/she thought had happened prior 
to, ‘during, “and after the event “depicted in a) single picture. 
Appraisers were asked to estimate the percentage of students who 
could give acceptable responses to each of these tasks. The 
tabulation of these estimates, which appears in Table A3.16, 


indicates that more than one-half of the appraisers expressed the 


Opinion that more than 75 per cent of students at the interface 
should have the knowledge required to give an acceptable response 
to the tasks in Part Four of the test. (The large number of 
secondary school appraisers--approximately one-third--who failed 
to respond to the two questions concerning the ability of students 
entering Grade Thirteen courses to give an acceptable response to 
tne tasks. in Part Four of the test) is ‘noted.’ When the Jest 
Appraisal Inventories were given to the appraisers, the response 
space was inadvertently omitted and very few respondents undertook 


to reply without an indicated space.) 


The tabulations of responses to four other questions about 
Part. four of the Speaking “lest” appear in’ Table AS. 17. From this 
table, it is apparent that fluency of the kind required to do 
this part of the test received some emphasis in the courses of a 
majority of the secondary school appraisers and all of the 
university appraisers who responded. It is additionally noted that 
most appraisers felt that the tasks in Part Four of the test were 
at a reasonable level of difficulty. Finally, it can be seen that 
a sizeable number of the appraisers felt that the criteria of 
total number of clauses, pronunciation, vocabulary, and grammar 
were not sufficient for judging students' responses to the fluency 
tasks. Suggested as additional or alternative criteria were the 
following: ease of expression or fluency, rhythm, speed, clarity, 
coherence, originality, humor, sentence complexity, and idiomatic 


proficiency. 


Finally, the appraisers were asked whether they used tests 
like the Speaking Test in evaluating the fluency with which their 
own students spoke French. The results presented in Table A3.18 
indicate that a majority of both the secondary school and the 
university appraisers made some use of this kind of test, 


although a sizeable fraction did not. 


The following conclusions about the Speaking Test seem to 


be in order: 


Lie) 


(a) The first two parts of the test assessed skills that 
students at the interface should have, but apparently 
most students would have acquired these skills before 
Grade Thirteen. These parts of the test were not 
really geared to the learning students did in the Grade 


hiutleen Weal. 


(b) The oral reading and fluency parts of the test assessed 
skills that most instructors expected most students at 
the interface to have. The tasks contained in these 
parts of the test seemed to be at a reasonable level of 
difficulty... for these, reasons, as,» well cas sfor.sthe 
reason that oral reading and fluency skills received 
some attention in Grade Thirteen and first year 
university courses, the third and fourth parts of the 
test seemed appropriate for use in assessing the French 


skills of students at the interface. 


(c) Other criteria in addition to those used to assess the - 
oral reading and flueftcy exercises were judged to be 
important by the appraisers. To this extent, the 
assessment provided by these parts of the test will be 


narrower than most appraisers would like. 


2.9 Concluding Statement 


ae overall impression conveyed by the appraisals of the Tests of 
French _as_a Second tanguage is one of qualified acceptance. Some 
parts of the tests seemed inappropriate for assessing students at 
the, Manterface because they focussed on knowledge that most 
secondary instructors expected would have been learned prior to 
Grade Thirteen. On the other hand, several parts of the tests 
contained exercises judged to, be at an appropriate level for 


interface students and also to tap skills that are important. 


Dis 


RECHNICAL, ISSUES 


3-1 Scoring the Multiple-choice Tests 


The items in both the Reading Test and the Listening Test were 
of the multiple-choice type, each item having four response 
options. In the instructions for working these tests, students 
were informed that a correction for guessing would be employed in 
scoring. The standard correction for four-choice items is to 
assign each incorrect answer a weight of -1/3. In this scoring 
scheme, correct answers are weighted 1 and omitted questions are 


ignored. 


S22. ocoringathe.Wepting sand Speaking Tests 


These tests posed questions to which the student responded, in 
one case in written form and in the other case in spoken form. 
These responses were then scored by one or more markers using 
procedures that could be applied relatively objectively. These 
procedures were developed for use in the IEA study of achievement 
inencenchpuc arrol |. 19.75). 

Each marking of a Writing Test yielded four numbers. The 
first of these, referred to hereafter as Total Writing I, was the 
number of correct answers to questions about grammar and about 
verbs and modifiers. These questions were contained in the first 
part of the Writing Test. Responses to these questions consisted 
of single words, each of which was scored O (wrong) or 1 
(correct). As 42 different responses were called for, scores on 


this part of the test could range from 0 to 42. 


The remaining three numbers were assigned to the 
composition written in response to the directed composition 
exercise contained in part two of the Writing Test. These numbers 
Carroll (1975, p. 78) defined as follows: 


lid 


a - the number of intelligible clauses; 


b - the number of clauses which are grammatically correct; 


c - the number of clauses which are correct in vocabulary. 


From these numbers, Carroll derived two others: 


d - the proportion of clauses 


Writing Quality = 40d + °50e; 


e - the proportion of clauses correct in vocabulary, i.e. 


Ce] Cae 


From these numbers, Carroll suggested the computation of the 


following three scores: 


Writing Quantity =" Sb +) Zc: 


Writing “Quality =" 40d=+ Sve; 


Total Writing II = Writing Quantity and Writing Quality 


The Speaking Jest yielded a number of different scores. The 
first was based on performance on the pronunciation exercise. 
There were 16 sentences in this part of the test, each of which 
was to be spoken by the student. Contained in the 16 sentences 
were 29 critical phonemes or phonological features of French. The 
student's rendition of each phoneme or phonological feature was 
scored 0 (unacceptable) or 1 (acceptable). Consequently, the range 
of possible scores on the pronunciation part of the test was 0 to 
Date 


The second score was for "structural control". The task a 
student performed in this part of the test could be described as 
follows: he heard a tape-recorded voice ask a question about each 


of 10 cartoons and he was to respond appropriately to the 


I le 


question within the context set by the cartoon. Each response was 
scored on a scale from O (low) to 4 (high). The points on the 
scale Carroll (1975, p. 72) defined as follows: 


Oe -—eno. »responsem nr asaresponse taf) ‘yes'.) or {linol'y OFy an 
inappropriate or unintelligible response or a response 


of "1. .don'ts:know.")s 


1 - an accumulation of serious errors in the response or 
one serious error accompanied by one or several minor 


errors; 

2 - one serious error or an accumulation of minor errors; 
3 - one or two minor errors; 

4 - completely correct. 


hee. JUGGNeNES ..Trequired— to score these responses» involved 
distinguishing serious errors from minor ones; following the 
Drecedures cGeveloped: forsthe BA. study, no list “of serious and 
minor errors was supplied the marker, but it was hoped that this 
distinction was made consistently by the marker throughout the 
Markings ipnocessa. Ihe. scalesfor asscore.on .this) part.of the.test 


ranged from 0 to 40. 


Ihe third part vef the: Speaking Jest was, oral reading)... Ihe 
paragraph that students were asked to read contained 15 
Sentences.) 162« “words of» ‘relatively .isimpley yk renehs sppose:, 
(Carrodde O75 0 (ps2) Carolla wentmwonebo. state... Scoring was 
based on a_e series of 24 4xpoints concerning particulars’ of 
pronunciation (including stress and intonation); a score of 9 or 1 
was given for each point depending upon its acceptability. Thus, 
thes total, cores, .couldirange from UetosZ74") (Garrold, g19V2,, ‘ps 
2s 


Ld? 


The final part of the Speaking Test contained two fluency 
exercises. In one exercise, the student was to describe what had 
happened in a series of three cartoon pictures. In the other 
exercise, the student was to look at a cartoon scene and "relate 
the probable action that took place before the events depicted in 
the picture, the actual events, and the probable outcome of the 
situation". Responses to these exercises were scored for the 


Following characteristics (Carroll, 1975, (0. 7): 
a - the number of intelligible clauses in the response; 


b = the number of different qrammatical structures 


represented in the response; 
c - the number of clauses correct in structure; 


d= tthe number “of clauses *conrect i1n “morphology 


(inf lection); 
e - the number of clauses correct in vocabulary; 


the number of clauses correct in pronunciation. 


— 
! 


Once these numbers were determined, the marker made a global 
rating of the two fluency responses on the following scale 
(Carrols LOmtae pre (bp 


O = very bad or no response; 
1 - poor but passable; 

2 - satisfactory; 

3 - good; 


4 - very good. 
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Carroll (1975, p. 75) suggested the use of the following 
formulas: 


Fluency Quantity Score = 2b + 2c + d + 2e + 2f3 


Fluency Quality Score = (6c + 3d + 6e + 6f)/a; 


Total Fluency = Fluency Quantity Score + Fluency Quality 


Scores 


where’ a, b, c, d, e and f are as defined above, and scores are 


Set toezero when @ ='0)- 


Carroll, alse “suggested «computing a» total score, on the 


Speaking Test (1975, p. 76): 


Total Speaking Score = Total Fluency Score + 2(p + s + 9), 
where p = pronunciation score 
see structuraliesontrol.seore 
6 = crak, reading: score 


3.3 Difficulty of the Reading and Listening Tests 


The measure of difficulty that was computed for each item in 
these tests was the average across schools of the percentage of 
students within each school who answered the item correctly. This 
measure is inversely related to difficulty in that the larger the 


percentage of correct responses, the easier the item. 


The difficulty indices for the items in the Reading and 
Listening Tests are reported in Table ASoel)).9) Disbrabubiens of 
these difficulty indices and the means and standard deviations of 
the distributions are given in Table A3.20. These results 


indicate that, on average, approximately 65 per cent of the Grade 
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Thirteen students writing these tests were able to _ respond 
correctly to the reading and listening items. The variation of 
difficulty indices is substantially larger for the Reading Test 
than for the Listening Test, but in each case the range of 
indices is from over 90 to Jess than 30. This. degree of 
variation in item difficulty is not atypical for tests designed to 
spread out the members of the population being assessed from 
relatively poor performance to relatively good performance on the 


test. 


3.4 Speededness of the Reading and Listening lests 


Evidence of speededness was sought in the percentage of students 
who failed to respond to an item. An arbitrary distinction was 
made, in compiling this evidence, between an omitted item and an 
item that had not been reached. A failure to respond to an item 
was taken as an indication that. the item was not reached if the 
student failed to respond to all the items that followed the item 
in question in the test. Otherwise the failure to respond was 


classed as an omission. 


The percentages of failures to respond that were classed 
"omit" and “not reached" for each item in the Reading Test are 
reported in Table A3.21. Because the test was divided into two 
separately-timed sections, ome would expect to see not reached 
percentages different from zero for items toward the end of the 
first part Sof theiitest.. Uinforbunately, tthe fact that, the weading 
Test was in two separate parts was ignored in this analysis. 
Consequently evidence of speededness in the first part of the test 


had to be sought in the percentage of omissions. 


If the first part of the Reading Jest was speeded, the 
percentages of omissions should grow systematically larger from 
Ones item itto.. tthe? thextey What Ahappens, gan. sideb, <iseychate ne 
percentages of omissions vary widely from item to item over the 
17 items in this part of the Reading Test (see Table A3.21). 


Systematic growth in the percentage of omissions, if it occurs at 
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all, is restricted to items 15, 16 and 17, and given that other 
items show even larger percentages of omissions than item 17, it 
is possible that the trend in the percentages of omissions for 
wromoa elo eG. “andi wd] Asveduevesinply 1to ther yract y thats? the 
corresponding number of students didn't have the requisite 
knowledge to answer these questions. Suppose, however, that all 
the omissions of items 15, 6" and’ 77" were caused) “by . test 
speededness. One rule of thumb for assessing speededness is to 
ascertain whether less than 100 per cent of examinees completed 
75 per cent of the test and whether less than 80 per cent of 
examinees completed all of the test. If this rule is applied to 
the percentage of omissions for items 15, 16 and 1/7 of the 
Reading Test on the assumption that these percentages reflect the 
fact that examinees were unable for want of time to complete the 
farst) part ‘of the test; then it must, be concluded that: «the test 
was speeded to some extent. But the degree of speeding does not 
appear to be serious. 


"W 


The percentages of failures to respond classed as not 
fearned, mor items (Sl=599cof e the mReadingiy lest, provide. ta (fair 
assessment of the speededness of the second part of this test, By 
the rule of thumb cited in the preceding paragraph, this part of 


the test was only minimally speeded, if at all. 


The Listening Test could not be assessed for speededness in 
the way that the Reading Test was, for it was administered in a 
different way. In the Listening Test, students heard a question 
spoken to them by a tape-recorded voice and then they had a brief 
interval) ,or 7bame to. orespond. Uihise intervals imaght have ‘been too 
short for some students, in which case the percentage of persons 
not responding to an item would be larger than it would have been 
if more time had been allowed for responding. But without anv 
evidence on what the percentage of failures to respond would be 
if a longer response time had been given, the percentages of 
omissions reported in Table A3.21 are better interpreted as 
evidence of lack of knowledge on the part of a number of the 


students taking the test than as evidence of speededness, 
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In summary, the figures reported in Table A3.21 do not 
create cause for concern about speededness as a factor affecting 
student performance on either the Reading Test or the Listening 


Test. 


3. 5° Scores onsthe Writingealest 


This instrument was included in the study because it was felt the 
study would be subject to criticism if writing were not assessed. 
But it was necessary to limit the number of students taking the 
Writing Test for the very practical reason that it was time 
consuming, and therefore expensive to score. In all, only 954 


students wrote the test. 


The 54 test papers were scored as follows: They, along 
with a set of scoring instructions, were given to each of two 
markers. These individuals scored the first five test papers 
independently, then met to compare results and: idwscuss 
differences. Finally, they independently marked the remaining 


papers. 


The means, standard deviations and low-high values of the 
distributions of scores assigned by the two markers are presented 
in Table A3.22. The four scores are those defined previously in 
the section on scoring the Writing Test. It is clear from these 
figures alone that the markers assigned very similar scores to the 
same test paper. A further indication of the degree of agreement 
the markers achieved is provided by the correlation coefficients 
reported in Table A%.23. Agreement is best dor Jotal Writing i 
a result not surprising in view of the fact that this part of the 
test was “marked: ‘véry ‘objectively.’ The, other scores called for 
greater exercise of judgement, especially in distinguishing an 
error in grammar from one of vocabulary. Even so, the correlation 


coefficients are acceptably high for this kind of marking task. 
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The corresponding scores assigned by the two markers to the 
same test paper were averaged. These averages were then used to 
compute Writing Quantity, Writing Quality and Total Writing II 
scores. (See the definition of these scores given earlier in the 


report. ) 


SOMem important information, Faboutyj;the Writing, Test, is 
contained in the means and standard deviations of the 
distributions of all the different scores on the test; these are 
meported fin’ fabplewAs S24 2 hotaly Writing) i, sthe score sfor::Part + 0ine 
OniLhee Wert ings 1 est, Hhas “a Wiixed scale; the spessible; range of 
scores is 0-42. The 54 students from across Ontario scored, on 
average, at approximately the mid-point on the scale. This is an 
ideal result for a constructed response test designed to spread 
students out in terms of their performance in French writing. The 
standard deviation of 7.5 for these scores indicates that the 
students were in fact spread widely over the scale in terms of 
their performance on the test. Most scores on Part Iwo of the 
Writing vest dojsnot shavera fixed “scale or yfhixed uppers lamin... The 
exceptions are the two proportion scores, which obviously range 
fEeOman07etowek. qthe relatively “large; standard sdeviations of” ‘the 
various Part Two scores indicate substantial variability about the 
means in the performance of the students. In the sense that it 
spread students out in terms of their performance, this part of 


the test was also functioning effectively. 


The Writing Test was used in an international study of 
achievement in French as a second language (Carroll, 1975). 
Results from this study provide a rough basis against which to 
assess some of the means and standard deviations reported in 
Table A3.24. Any comparison of the results achieved in the two 
studies is rough because different individuals were involved in 
the scoring; consequently, it is impossible to say whether the 
scoring criteria were applied in exactly the same way in both 
studies. Nevertheless, it is important to note that where 
comparisons are possible, the means and standard deviations 
reported in Table A3.24 fall within the range of means and 


standard deviations reported for the dif ferent countries 


aye, 


participating in the international study (see Carroll, 1975, pp. 
164-165). 


It is possible to obtain a crude estimate of the reliability 
of Total Writing I scores by using the mean and_ standard 
deviation reported in Table A3.24 and Kuder-Richardson formula 
Dee (Lond: send. NGVickeveRhIGeeeror seo). Mhistest inmates 1s) crude 
because it is only a lower bound, which is to say, the scores are 
very probably more reliable than the estimate suggests. Even so, 
the reliability, as estimated by this method, is 0.83, a result 
that “e-Gaéceptably “high for-this’ kind of ~testt and? witchine the 
range” of “estimates” ‘observed "by Carroll (1975, p.0 96)o8 The 
associated standard error of measurement of 3.1 is, again, well 


within the range of values observed by Carroll for this test. 


The data generated by the directed composition exercise in 
part two of’ the “Writing ‘Pest’ = cannot’ be™ used to: vestimate 
reliability coefficients and associated standard errors of 


measurement for scores on this part of the test. 


The question arises as to whether or not students are 
placed at the same relative positions along the different scales 
of the Writing Test. One way to answer this question is to 
compute coefficients of correlation among the various scores on 
the test. These correlation coefficients are reported in ‘Table 
A3.25. Because all the different scores on Part Two of the test 
are based on the same exercise, these scores are not independent 
and some degree of correlation is to be expected. There are, 
nevertheless, several points to be made on the basis of the 


results given in Table’ A3.25. 


(a) ‘Total’ Writing’ 1 is: andependent%of all the other scores. 
Its highest correlation with another score is .66 with 
Total Writing II. This indicates that the two’ parts of 
the test did allocate students to somewhat different 
relative positions along the corresponding scales of the 


test. In other words, the two tests appear to measure 


two somewhat different, though related, French writing 


abilities. 


(b) The correlation of 0.76 between Writing Quantity and 
Writing Quality is very near the correlations of 0.71 
anda.G. 75 reported @by*iGarrold’ GL97 Seep. 29 and ps 
107) for two different groups. 


(c) A further comparison with Carroll's international study 
is possible using the porpeleenane of Total Writing I 
with each of Writing Quantity and Writing Quality. In 
the present study, these correlations are 0.64 and 
0.59 respectively; Carroll reports figures of 0.59 and 
O..60)3 


(d) Total Writing II is very closely related to Writing 
Quantity and its correlation with this variable is much 
higher than it is with Writing Quality. This indicates 
that the majority of variation in Total Writing II is 
accounted for by variance in Writing Quantity. To 
preserve the unique information contained in Writing 
Quality scores, both Writing Quantity and _ Writing 
Quality scores should be included in analyses’ of 
performance» on’ *Part Two) of» theisWriting Test» and’ the 


single score Total Writing II should be ignored. 


In’ summary, all indications are that the Writing Test 
functioned in an acceptable manner in Project II. Moreover, the 
results achieved using this test are about what would be 
expected, given the way the test performed in a_ previous 


international study in which it was used. 


>. 6eScoresoni the Speaking Test 


This instrument was also included in the study to avoid the 
criticism that might have been advanced against the study if 


speaking skills had been totally neglected. Nevertheless, it was 
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necessary, because of the amount of time required to administer 
and, subsequently, to score the test, to limit the number of 
students assigned to take it. In all, 58 students attempted at 
least part of the test, and 50 students completed the whole test. 
The results reported here are for the 50 students who completed 
the test. 


The 58 tape-recorded responses were scored by a single 


marker. This person had had previous experience scoring the 


speaking Test. 


The means, standard deviations and minimum and maximum 
scores of the distributions of scores on the Speaking Test are 
reported in Table A3.26. These results should be interpreted, 
bearing in mind that Pronunciation, Structural Control, Oral 
Reading and Global Rating of Fluency have fixed score scales with 
Maximum possible scores of 29, 40, 24 and 4, respectively. All 
the other score scales do not have maximum possible values. In 
general, the test seems to have produced reasonable results. The 
range of scores is narrower than might have been desired on the 
Pronunciation and Oral Reading scales. But when these results are 
compared with those obtained in an international study (Carroll, 
ESTILO Pp #08 55h OT! BS al G4 a> ande22 4) eta tes papparentathar 
the test was functioning in this study in much the same way that 


it had in the international study. 


Crude estimates of reliability were obtained for 
Pronunciation, Structural Control and Oral Reading scores by 
substituting their respective means and standard deviations in 
Kuder-Richardson formula 21 (Lord and Novick, 1968, p. 91). The 
estimate of reliability provided by this formula is crude because 
it is known to be a worse lower-bound estimate than that provided 
by other means. Unfortunately, the data required to apply another 
means of estimating reliability were not available for these 
scores. Besides’ being crude, at) as) doubtful ‘that -KR-21 “is 
applicable to the Structural Control scores. KR-21 was derived 
for application to tests comprised of dichotomously scored items. 


The Pronunciation and Oral Reading parts of the Speaking Test may 


be regarded as composed of dichotomously scored items; the marker 
focussed on 29 and 24 linguistic features in these two parts of 
the test and scored each of them O for incorrect or unacceptable 
and 1 for correct or acceptable. The Structural Control part of 
the test, on the other hand, was composed of ten items and the 
student's response to each item was rated on a scale from 0 to 4 
(see previous discussion of scoring). When KR-21 is used to 
estimate the. reliability of Structural Control scores, each score 
on the test is being treated as if it arose as a result of 40 
separate dichotomous judgments of corbpect), orsqincorneck; mot..10 
ratings on a O-4 scale. Nevertheless, Carroll (1975) used KR-21 
with Structural Control and hence it was. applied here for 


comparative purposes. 


Thesereiiabs lity: ),estamates»y fore) .Pronunciation..joStouctural 
Control and Oral Reading were 0.52, 0.88 and 0.63 respectively. 
Associated with these figures are estimates of the standard error 
of imeasurementsi i224, 229) and 222i ~khe,estamates of treliability 
for the Pronunciation and Oral Reading scores are disappointingly 
low when judged in absolute terms and the associated standard 
errors of measurement are large relative to the standard 
deviations of scores on these parts of the Listening Test. It is 
true, nevertheless, that the reliability estimates and the 
standard errors of measurement of these scores and of the scores 
on Structural Control are well within the range of values 
reported eby CarrolieGs975 , op. 94). 


Estimates of reliability for the counts that were made in 
scoring the fluency tasks were achieved in a different way. This 
part of the test was comprised of two exercises. Each exercise 
was scored separately. The estimates of reliability were based on 
correlations between corresponding scores on the two exercises. 
The results are reported in Table A3.27. These figures are very 
close to values reported by Carroll (1975, p. 95) for the same 
scores. (Elsewhere, corresponding scores in both fluency exercises 
are summed to yield a single score of each type. These summed 


scores generated the results reported in Table A3.26). 
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Because of the nature of the data that were collected and 
because all the Speaking Tests were scored by just one person, it 
was not possible to estimate either the reliability of the Global 
Rating of Fluency or the degree of rater consistency that it is 
possible to achieve in the Speaking Test. Carroll reported (1975, 
pp. 94 and 95) estimates of rater consistency that are acceptably 
high for this type of test. There is no reason to believe that 
the trained marker used in the present study would be markedly 
less consistent than the markers who scored the Speaking Test in 


the international study. 


A further study of the Speaking Test was made by 
intercorrelating all the different scores» on the “test. The 
resulting, correlations are? reported (in “lable »A3 .28.. -Garrol) 
reported correlations between some of these same variables (1975, 
pp. 75, 76, 107) for data collected in the international study. 
The correlations he reported among Pronunciation, Structural 
Control, and Oral Reading, and the correlations he reported 
between these three variables and Fluency Quantity and F luency 
Quality, are somewhat lower than the correlations observed in the 
present case. But the correlations Carroll reported among F luency 
Quantity, Fluency Quality, and Total Speaking Test and those 
between "either ~Votal? | .Puluency: ore’ >i otale (Speaking, Jiestémeand 
Pronunciation, Structural Control and Oral Reading are very close 


to the corresponding correlations observed in this study. 


One other conclusion can be advanced on the basis of the 
results presented in Table A3.28. The correlation between Total 
Fluency and Fluency Quantity is almost perfect whereas’ the 
correlation between Fluency Quality and Total Fluency is very 
much less (0.77). This indicates that the Total Fluency and 
Fluency Quantity are redundant and that performance on the fluency 
exercises of the Speaking Test is best described when Fluency 


Quantity and Fluency Quality are included as separate variables. 


From the preceding summary of evidence, it appears that the 


Speaking Test generated results in the present study that are both 


reasonable and very much in line with what would be expected on 


the basis of past experience with the test. 


3./ Reading Test and Listening Test--Information on 
Discriminations, Reliability and Distribution of Scores 


Discrimination is an important characteristic to note about the 
items of tests designed to spread students across the range of 
possible scores. The quality of discrimination can be loosely 
defined as the extent to which student performance on an item is 
correlated with overall performance on the test. The biserial 
coefficient of correlation between item scores and total test 
scores is frequently used as an index of item discrimination. By 
one rule of thumb, biserial coefficients of correlation greater 


than 0.3 can be considered acceptable. 


The indices of item discrimination for the items in the 
Reading Test and the Listening Test are in Table A3.29. These 
indices were computed from the test responses of all the students 
who wrote the Reading Test and the Listening Test. These 
students were all enrolled in Grade Thirteen courses in French in 
the participating schools of the study. From the results reported 
in Table A3.29, it can be seen that all items in both tests, 
except for two in the Reading Test--item 18 and item 21--had 
indices of discrimination in excess of 0.3. The two items with 
relatively low discrimination indices were very easy (see Table 
A3.19). This cannot be taken as evidence, however, that all very 
easy items have low discrimination indices; item 1 in the Reading 
Jest)*was “allso’ very easy’ ®for ‘'theses “students® but’ it ’thas an 
acceptably high discrimination index. The obvious conclusion from 
these results is that acceptable levels of item discrimination 


were realized for the vast majority of items in both tests. 


Distribution statistics and information on reliability and 
error of measurement are reported in Table A3.30. It can be seen 
from the means reported in this table that both the Reading Test 
and the. Listening Test were relatively easy for Ontario 


ig 


SSHGD-level students. An ideal mean for the distribution of 
corrected-for-guessing scores on a test designed to_ spread 
students over the range of possible scores would be approximately 
one-half the number of questions on the test. The means on both 


the Reading Test and the Listening Test exceed the ideal. 


The distributions of scores on the tests (not shown) were 
differently shaped. Despite its relatively high mean, the 
distribution of corrected-for-quessing scores on the Reading Test 
was approximately bell-shaped. Only three persons got a perfect 
score on the test and only one got a negative score. (A negative 
score reflects performance at a level below that which would be 
expected were all questions to be answered by guessing.) The 
distribution of scores on the Listening Test was, in contrast, 
negatively skewed. There was a preponderance of high scores, 
reflected in part by the fact that 14 students achieved perfect 
scores ton) the test.= Bubptthey distribution stalled off Gintow the 
region of negative scores--5 students performed less well on the 


test than they might have expected to do by guessing. 


Lhes.coefifa cients ‘of Preldabiiatyesfors bothwtves teware 
acceptably high for the purposes of the study. These purposes are 
to compare the performance of different groups of students and to 


predict school marks using scores on the tests. 


Both the Reading Test and the Listening Test were used in 
an international study of achievement in French (Carroll, 1975). 


Carroll reported (see p. 93) means, standard deviations, 
reliability coefficients and standard errors of measurement for 
these, tests» jforeceachm.of yaseVieral™ different mcountries: 
Unfortunately, the figures Carroll reported in his discussion of 
the technical qualities of the tests are not directly comparable 
with those obtained in the present application of the tests. 
There are two reasons for this: (i) Both the Reading Test and the 
Listening Jest were modified somewhat for administration to 
Ontario students. The wording of the Reading Test was changed in 
14 places; the last six items of the Listening Jest used -in the 


international study were deleted in adapting it for use in 
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Ontario. In addition, the procedure for administering five items 
of the Listening Test was changed; instead of hearing the 
material for items 30 to 34 of the test spoken once, as in the 
international study, students in the present study heard the 
material spoken twice. All these changes were made in an effort 
to render the tests more appropriate for use with Ontario 
students. (ii) Carroll did not employ a correction-for-guessing in 
his analysis of test reliability, although the students were told 


a correction would be applied. 


Because of the differences in the testing methods used in 
the two studies, it is difficult to predict how all the Ontario 
results would change if the method of the international study had 
been strictly followed. It is probable that the mean reported in 
Table A3.20 for the Listening Test is lower than it would have 
been had the method of testing used in the international study 
been followed exactly, but the effect of the differences in 
method on the standard deviation, reliability coefficient and 
standard error of measurement is difficult to predict. Even so, 
the results obtained in the present study fall within the range of 
results reported by Carroll. In comparison with the results 
achieved in the international study, the Reading Test and the 
Listening Test both seem to have functioned well in the present 


application. 
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TABLE A3.23 


Correlation Coefficients Between the Scores Assigned 
by Each of Two Markers to 54 Writing Test Papers 


Variable CoetiweLene 
Lotalewriteng yh W229 
Number of Clauses O72 
Number of Grammatically Correct Clauses 0.84 
Number of Clauses with Correct Vocabulary 0.84 
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APPENDIX A4 
LECONDE AS wRERORT, ON GTHE REST 


DE _CONNAISSANCE DE LA LANGUE (ANGLAIS) 


SS ER SA NR 


This test was prepared for administration to Francophone students at 
the SSGD and SSHGD levels to assess selected aspects of their 
eomperence saneethe use of} English. .[hespurposesof this report, iis to 
describe the contents of the test in some detail, to summarize the 
responses of a group of secondary and postsecondary instructors who 
appraised the test, and to provide some information about technical 


matters related to the test. 


ie DEeoGhiPiLONeOF TESIPGONWENT 


The Test de connaissance de la langue (anglais) contained two 


parts. The first part assessed some aspects of the ability to 
read English for comprehension. The second part was designed to 


assess some aspects of the ability to write in English. 


ie. leading Comprehension Raise 


This part of the test, which was prepared at the University of 
Michigan, consisted of four passages, each accompanied by five 
multiple-choice questions. The passages ranged in length from 
approximately 150 words to approximately 200 words and dealt 
with the following topics: Viking invaders of England, systems of 


writing, describing a sailor, and poetry and the arts. The 


multiple-choice questions associated with the passages could be 
classified roughly as assessing three different types. of 
comprehension: literal understanding of a passage; identification 
of the main purpose of a passage, and ability to draw inferences 


or conclusions from a passage. 


eZ. Weitange Exercase 


This part of the test consisted of a passage written in English. 
The student was to read the passage, and then to write a 100-to 
150-word summary of the main points in the passage. Finally, the 
student was to write a brief statement of his own opinion about 


the issue dealt with in the passage. 


Obviously, more than writing ability, however defined, was 
required to perform the "writing" test. The added component was, 
again, ability to read for comprehension. The passage which 
formed the basis of the test was part of an article published in 
a Canadian magazine. At a length of approximately 550 words, it 
was considerably longer than the four passages contained in the 


"reading comprehension" part of the test. 


TEST APPRAISAL 


The Test de connaissance de la langue (anglais) and the related 
Test Appraisal Inventory were sent to practising teachers as a 


means of sampling professional opinion about the test. Among 
those asked to respond were teachers in the fourteen Francophone 
secondary schools of the study who taught English or anglais 
courses to Grade Twelve and Grade Thirteen students. In addition, 
the test and inventory were sent to some of the instructors in 
two Ontario universities and three Colleges of Applied Arts and 
Technology teaching English to Francophone students in first year 
university and college courses. Because there was no attempt to 


draw a probability sample of instructors at either the secondary 
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or the postsecondary level, the responses that were received 
cannot be used to draw inferences about the opinions held by the 
population of individuals in the province giving English language 
instruction to Francophones. The responses can only be said to 
constitute the collective opinions of the 41 secondary school 
teachers and the 18 college and university instructors who 


undertook to comment on the tests. 


A tabulation of coded responses to each question contained 
in the inventory is presented in Table A4.1. There are several 
features of the table that deserve mention here: (i) The secondary 
school teachers responded in terms of courses for Francophones in 
English or anglais at either the Grade Twelve or Grade Thirteen 
level. Thus, the responses of the 41 secondary school teachers 
are divided among four columns, one column for each type of 
course at each grade level. The number of responses to a question 
when summed over the four columns, totals more than 41, however, 
because some teachers responded for two or more courses. of 
different types and grade levels. (ii) Secondary courses were not 
differentiated by "general" and "advanced" designation. Although 
this would have been desirable, it would have made the inventory 
more complex and time-consuming to complete. (1ii) The responses 
of college and university instructors were not separated for the 
purpose of tabulating results but appear in a single column of the 
table. Too few instructors from either colleges or universities 
responded to make such a separation worthwhile in the sense of 
revealing meaningful differences. (iv) The postsecondary appraisal 
inventory overlapped the secondary inventory but contained fewer 
questions, a fact that accounts for the absence of postsecondary 
tabulations’ for questions Loy miGy 730022: vandi232 “(v) oT heenunber 
of responses that were tabulated is not necessarily the same from 
question to question for either the secondary or the postsecondary 


group because some questions were omitted by some respondents. 
In addition to coded responses, the instructors who 


completed the appraisal were invited to offer written comments at 


appropriate places throughout the questionnaire. These comments 
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have been used to elaborate the impressions provided by the coded 


responses. 


2.1 Reading Comprehension Part 


Responses by instructors at both the secondary and postsecondary 
levels to the first question in the inventory (see Table A4.1) 
indicated good support for the inclusion of reading comprehension 
in a test of English for Francophones. Regarding the four reading 
passages included in the test, there was general agreement among 
all instructors, regardless of level of course being considered, 
that the difficulty of the reading passages as a set was about 
right (see tabulation for Question 2, Table A4.1). When judged 
on an individual basis, the first reading passage was regarded as 
somewhat easy, whereas the second and fourth passages were 
thought to be somewhat difficult, (see tabulations for questions 
3,4 and 6). Despite this, the majority of instructors agreed that 
the four passages were representative of the material students in 
Grades Twelve and Thirteen should be able to read with 
comprehension. Moreover, the tabulation of responses to Questions 
8-ll indicated that the majority of instructolis, again without 
regard for level of course, considered none of the passages as 
inappropriate for reasons other than difficulty. Written comments 
from those instructors who did object to a passage for reasons 
other than difficulty covered a variety of complaints. These 
included Vian obgeetionr ‘to, ithe) sfarst Vipassage: because, )ibagwas 
"culturally loaded" and therefore uninteresting; an observation 
that the second passage involved a_ technical subject with 


specialized vocabulary; and a comment that the fourth passage was 


a) tio olish hpi eres, fi kleds with <dsinplihications- "and 
popularizations". It must be remembered, however, that these were 
reactions from a minority of the appraisers. In fact, one 


appraiser said of a passage that it was similar to what would be 
ex pectiedilin literature and “composition, .classes at’ the 


postsecondary level. 


The multiple-choice questions about the reading passages 
could be characterized as assessing (i) the ability to comprehend 
the literal meaning of a passage, (ii) the ability to identify the 
main idea or objective of a passage, and (iii) the ability to 
draw conclusions or implications from a passage. Instructors at 
all levels were asked to estimate the number of students who--on 
entry into their courses--could read successfully for any one of 
these purposes. The tabulations for Questions 12, 13 and 14 of 
Table A4.1 support the reasonable expectation that, as the level 
of instruction increases from Grade Twelve through Grade Thirteen 
to postsecondary, the level of expectation of instructors for what 
students can do when they enter a course also increases, and that 
expectations of instructors, regardless of level of instruction, 
are higher for literal comprehension ability than for ability to 
identify the main idea or objective; expectations for the latter 
ability are.) ine turny highers"than? forMether abikitywtomsdraw 
conclusions and implications. When these results are coupled with 
the tabulations of responses that secondary instructors gave to 
questions about the degree of attention accorded reading for 
literal comprehension, for identification of main ideas, and for 
conclusions and implications (see the tabulations for Questions 
15, 16 and 17), it becomes clear that, however limited the set 
of reading comprehension abilities assessed may have been, they 
are the skills that the group of secondary instructors who 
responded emphasize in their courses and that the students of the 
postsecondary instructors who responded are expected to possess in 


some measure on entry into first year courses. 


The limitations of the reading comprehension test were 
emphasized in the responses to Questions 18 and 19. On the order 
of one-half the instructors, regardless of level, indicated that 
there were important aspects of reading comprehension not measured 
by this test. In their written responses, they indicated that the 
test did not assess such things as a student's understanding of 
vocabulary and his reading speed. One respondent even questioned 
whether or not the test items did actually assess ability to 
interpret or draw inferences, as had been suggested in the test 


appraisal inventory. There was a strong body of opinion among 
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these instructors that multiple-choice tests of the sort used to 
assess reading comprehension were inefficient and inadequate 
measures of the three abilities noted above. Concern’ was 
expressed about the effects of chance success in the _ scores 
obtained on the tests, and about the need for some kind of 


written exercise to assess understanding and ability to interpret. 


If there are general conclusions to be drawn about the 
reading comprehension part of the Test de connaissance de la 
langue (anglais), they would seem to be (i) that the test was 
limited in obvious ways because of the use of multiple-choice 
questions, and (ii) that specific aspects of certain passages and 
questions were bothersome to one or another of the instructors 
who appraised the tests, but (iii) that, when these limitations 
are taken into account, the test was a satisfactory means of 
assessing some important components of reading comprehension 
ability. 


2. Zc binge Exereise 


The Test de connaissance de la lanque (anglais) posed, in a second 
part, the task of reading a passage, summarizing its contents in 


a précis of 100-150 words, and writing a commentary of 100-150 
words on the points made by the author of the passage. As the 
tabulation of responses to Question 20 reveals, most instructors 
felt that a composition task. is either important. or very 
important in an assessment of language competence. The tabulation 
of responses to Question 21 indicates that the composition task 
included in the test was one that most instructors would expect 
most students to be able to do when they entered a particular 
course, although expectations were higher for students entering 
Grade Thirteen and postsecondary courses than for students 
entering Grade Twelve courses. The responses of secondary 
instructors to Question 22 suggest that development of the skills 
needed to perform a composition exercise such as the one that was 


included im the test is an, objective of virtually all the 
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secondary instructors, although degree of emphasis varies to a 


considerable extent. 


A further indication of the suitability of the writing 
exercise is provided by responses to Questions 24 and 25. The 
majority of instructors agreed, in response to Question 24, that 
the exercise was pitched at an appropriate level of difficulty, 
although the extent of agreement was, not unexpectedly, somewhat 
less in the responses for Grade Twelve courses. In response to 
Question 25, most instructors agreed that a single writing 
exercise was not an unfair basis for assessing students, although 
there were written comments in which the particular writing 
exercise used in the test was criticized. Among the written 


comments were the following suggestions: 


(a) that the student be offered a choice of topic--choice 
being essential when personal opinion--even on general 


interest subjects--is asked for; 


(b) that the writing exercise be supplemented with an oral 
test so that questions could be asked about what a 


student had written; 


(c) that the writing test be supplemented with a listening 
test in which the student took notes, much as_ in 


lecture classes; and 


(d) that a speaking exercise be added to the test. 


Question 25 of the appraisal inventory asked the instructors 
to rate various criteria for grading the writing exercise. Ihe 
tabulation of ratings provides evidence of the extent of 
disagreement amongst the instructors as to the most important 
quality of good writing--structure; logical presentation; style; 
or grammar, usage and mechanics. Only Sther@errterira 
"Choice-of -words" was rated low by most secondary instructors and 
relatively low by postsecondary instructors. In their written 


comments, ‘the respondents drew attention to such things as unity, 
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coherence, and clarity as other criteria of importance. Another 
respondent commented that it was difficult for him to distinguish 
criteria when it came to judging a student's composition and that 


the most important thing was "clean, succinct communication". 


Other written responses about the writing exercise pointed 
to. its, difficulty and. ta the slimitatiam tof the vienath) of sine 
précis to only 100-150 words. A few instructors thought that the 
exercise was too difficult and that a précis of 200 words should 


have been judged acceptable. 


In conclusion, it seems fair to say that the respondents to 
the Test Appraisal Inventory agreed on the need for a writing 
exercise in a test of language competence and felt that the 
writing exercise used in the test was an appropriate one for 


assessing some important aspects of language competence. 


2.3 Multiple-Choice Questions vs Writing Exercise 


Four questions at the end of the appraisal inventory solicited 
opinions about the use of multiple-choice tests for assessing 
various aspects of language competence and about the need for both 
a writing exercise and a multiple-choice test. The conclusion 
that may be framed in light of the tabulations of responses to 
Questions 27, 28, 29 and 30 is that the respondents saw a place 
for multiple-choice questions but that they thought that, when 
language competence was being assessed by means of both 
multiple-choice questions and a writing exercise, the assessment 


provided by the writing exercise should be accorded more weight. 
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feGHNCCAL ISSUES 


3.1 Scoring the Reading Comprehension Part 


This part of the test contained 20 multiple-choice items, each 
having four options. As indicated in the instructions to students 
taking this test, a correction for guessing was applied. This 
correction was the standard one for four-choice items, giving 
each incorrect answer a weight of =1/3. A correct response was 


scored 1 and omitted questions were ignored in the scoring. 


Bree Scoring the Writing Exercise 


The writing exercise consisted of a passage to be summarized and 
commented upon. This exercise was scored for the adequacy of the 
summary and for the quality of the commentary. In scoring the 
summary, markers awarded one mark for each idea in the passage, 
up to a maximum of 10, and deducted marks for excessive length 
and for misrepresentation of ideas in the passage. The commentary 
was scored holistically on a scale from 1 for low quality to 10 


for high quality. 


The task of scoring the writing exercises was accomplished 
as follows. All exercises except 20 were scored by three 
different markers. The special set of 20 exercises was scored by 
all six of the persons engaged to do the marking. Scores on the 
special set of 20 exercises were used to implement a procedure 
for adjusting the scores assigned by different markers. 
Adjustments were necessary to remove differences among _ the 
markers in the average mark assigned, the range of marks assigned 
and the reliability of marking. The procedure used to adjust 
marks is described in Appendix D2. In the end each writing 
exercise had assigned to it only two scores, one for the summary 
and one for the commentary, each representing a synthesis of the 


marks assigned by the different markers. 
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3.5 Item Difficulty in the Reading Comprehension Part 


Difficulty is indexed as the average over the Francophone schools 
in the study of the percentage of students within a school who 
responded correctly to a multiple-choice item. Given the sampling 
design of the study, this percentage constitutes an estimate of 
the fraction of students in the Francophone population who would 
be able to answer the question correctly. (For a definition of 


this population, see Chapter Two of this report.) 


The difficulty index of each item for Grade Twelve students 
and for Grade Thirteen students is reported in Table A4.2. A 
mean and standard deviation of the difficulty indices is also 
reported for each grade. It is apparent from these statistics that 
the reading comprehension part of the Test de connaissance de la 
langue (anglais) was relatively easy for students in either grade, 
although it was clearly easier for the average Grade Thirteen 
student than for the average: Grade Twelve’ student. This 
conclusion is reinforced by the frequency distribution of item 
difficulty indices that appear in Table A4.3. All items except 
one have percentages of correct responses above 50 for the Grade 
Twelve students and above 60 for the Grade Thirteen students. In 
conclusion, if the objective of testing is to spread students out 
just as much as possible in terms of their performance on the 
test, \Sthen othe’ “reading comprehension: part) of “the / Vest aide 
connaissance de la langue (anglais) was too easy for Grade 
Thirteen students. An average item difficulty somewhat higher 
than 0.60, but not as much higher as 0.76, can be regarded as 


ideal. 


3.4 Speededness of the Reading Comprehension Part 


The time allowed students to work on the reading comprehension 
part was 30 minutes. This appears to have been more than enough 
time for the vast majority of students to attempt all the items 
in the test. In analyses of multiple-choice tests, the convention 


sometimes followed is arbitrarily to decide that a student has 
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not reached an item if he fails to answer it and all succeeding 
items in the test. Otherwise, an item that has not been answered 
is classified as an omitted item. In the analysis of Grade Twelve 
responses to the reading comprehension part of the Test de 
connaissance de la_lanque (anglais), no omission of an item was 
Classified as not reached, except for omissions of the very last 
item. It is a matter of choice whether to class failures to 
respond to the last item in a test as evidence of too little 
working time or as evidence of an inability to identify the 
correct answers. In the analysis of Grade Thirteen responses, 
only one per cent of students could be classified as not having 
reached the second last item and six per cent were classified as 


not having reached the last item. 


Additional evidence on the lack of speeding is contained in 
the percentages of omissions that are reported in Table A4.2. 
For every item, these are very small or zero, suggesting that 
students had ample time to go back over their work and attempt 


questions that were not answered the first time through the test. 


In general, then, it can be concluded that speededness was 


not a problem on the reading comprehension part of the Test de 


connaissance de la lanque (anglais). 


3.5 Reading Comprehension Part: Information on Discrimination, 
Reliability and Distribution of Scores 


Discrimination can be loosely defined as the extent to which 
performance on an item is correlated with overall performance on 
a test. In tests designed to spread students over the range of 
possible scores, it is generally held that items should have high 
indices of discrimination. The biserial coefficient of correlation 
between scores on an item and scores on the test of which the 
item is a part is used as an index of discrimination. One rule 
of thumb for assessing this index is to judge it in relation to 
0.3: indices higher than this are acceptable, indices lower than 


this are unacceptable. 
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The indices of discrimination for items in the reading 
comprehension part of the Test de connaissance de la _lanque 
(anglais) are reported in Table A4.4. These figures were derived 
from an analysis of the responses of all the students at both the 
SSGD and SSHGD levels who wrote the test. It can be seen that 
the “indices: “reported ® an *Table* eas .42%arel) alll sabove @Ueseue A 
conclusion regarding the acceptability of the item discrimination 


indices for this test’ is obvious. 


Other information about the reading comprehension part of 
the Test de connaissance de la langue (anglais) is provided in 
Table A4.5. The fact that this part of the test was relatively 
easy for the group of students who wrote it is reflected in the 
relatively high mean. As might be expected, given that the test 
was easy, the distribution of scores (not shown) was negatively 
skewed. One indication of this skewness was the fact that 15 
students” answered Yall’ 19"*questions “correctly,” “but “only “three 
students had below-zero scores. (Negative scores reflect a level 
of performance below that which would be expected if responses 


were made entirely at random.) 


As regards reliability, the figure 0.71 is adequate given 
the ‘brevity of: * the «test, and tthe? purposes—sfor Se whiechseie awas 
intended in this study. These purposes were to make comparisons 
among different groups of students in terms of their performance 


on the test and to predict school marks. 


3.6 Reliability of Scoring the Writing Exercise 


An estimate of the reliability with which the writing exercise of 
the Test de connaissance de la langue (anglais) was scored was 
obtained from a factor analysis involving the 20 exercises that 
were scored by all six markers. Using scores on these exercises 
as “observations onsthe markers, “avG™ by) 6° matric of athe 
coefficients of intercorrelation among the markers was obtained. 
When this matrix was factor analyzed, a single common factor was 


obtained. The squared factor loading of a marker on this factor 
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constitutes a lower-bound estimate of the reliability with which 
the marker scored the essays. (The theoretical rationale 
underlying this assertion is provided in Harman, 1960, Chapter 
i) 


Two estimates of reliability were obtained in this way for 
each marker: the first was an estimate of the reliability with 
which the summary portion of the writing exercise was scored; the 
second estimate was of the reliability with which the commentary 
part of the exercise was scored. These estimates of reliability 


are reported in Table A4.6. 


It is apparent from the numbers presented in Table A4.6 
that the summary part of the writing exercise was scored much 
less consistently than the commentary part. This may be a 
reflection of the different scoring procedures that were followed 
for each part. The summary was scored by counting the number of 
ideas in the summary that were present in the paragraph that was 
to be summarized, and by making deductions for excessive length 
and misrepresentation of ideas. The commentary, on the other 
hand, was to be scored holistically, by assigning scores on a 
scale from 1 to 10 so as to represent the quality of the 
commentary. Obviously, the markers achieved much higher agreement 
in scoring the commentary, using the apparently more subjective, 
holistic judgment approach, than in scoring the summary, using 


the seemingly more objective approach of counting ideas. 


The question arises of whether these estimates of 
reliability are adequate for the purposes of the study. These 
purposes consist of making comparisons among various subgroups of 
students and predicting school marks. It is obvious that the 
scores on the commentary, with their higher reliability of 
scoring, will serve these purposes better than the scores on the 
summary. The relatively low reliability of the summary scores 
means that the "confidence interval" for an intergroup difference 
will be larger than it would have been had a higher degree of 
reliability been achieved. Also, the variance of errors of 


estimate about the line regressing school marks on test scores 
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will be larger than it would have been had the reliability been 
higher. 


The less reliable the measurement of a variable, the more 
that chance (random) factors affect the results and the less 
likely it is that observed differences between groups or observed 
coefficients of regression will be judged to be significant, in a 
statistical sense. Nevertheless, the relatively low reliability of 
the summary scores does not provide grounds for excluding them 
from the study; they may be found to yield results of interest 
despitest heir) lowprelsiabulity: 
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TABLE A4.1 
FREQUENCY TABULATION OF CODED RESPONSES IN APPRAISALS OF THE 


TEST DE CONNAISSANCE DE LA LANGUE (ANGLAIS) 


Secondary 
(N=41) 
Grade 13 Grade 12 


1. Do you consider that a test 
of reading comprehension, 
not necessarily the one 
used, assesses an important 
component of language 
achievement at this level 
or at the beginning of post- 
secondary study? 


GL) EVES aa ete hee eer ec enn 


Questions 2-6: Apply the 
following responses and 
corresponding response codes: 
(1) too easy 

(2) somewhat easy 


(3) about right 


(4) somewhat difficult 


Response Codes 


(S)) etoorditircule 


2. What is your overall 
assessment of the 
difficulty level of four 
reading comprehension 
passages? 


wm Fw fy Ke 


Zo 2 


TABLE A4.1 (Continued) 


Assess the difficulty of each 
passage separately in terms of 
thesabove set. of responses: 


ot 


Passage on Viking invaders 


Passage on systems of 
writing 


Passage describing the 
sailor 


Passage on poetry and 
the arts 


Are the four passages 
representative of the 
material you would 
expect students at 
this level to be able 
to read with compre- 


Response Codes 
(see previous page) 


OFwWDNY Ye py ee (ee SY Td wow F won 


Oi 1) ho) 


hension? 
CU VEY ES ose cxecersra okers teeters (toro ie oes fola ce ais 
(2 ie PNO lay eet eronte ss A PRO O-ORe Ore ‘ 


LPS 


Grade 13 


a... aa 


Grade 12 
fe) 
n 4 


Anglais 


Post- 
Sec. 


TABLE A4.1 (Continued) 


RG al 
Grade 13 Grade 12 
as 
. i ad ; 


Questions 8-11: Is any passage 
inappropriate for reasons 
other than difficulty? 


8. Passage on Viking invaders 


(2) RUN ESS. Seahes diac sae ce teem tee 

(2) INO scar Prieta e Cran ey oe oar ie 
9. Passage on systems of 

writing 

(3): IES 200, oc) darieeen cer escn tie eas ares 

(2 FBNG 3 erate sscuanchaceee eae aectae re cetera oe ee 
10. Passage describing the 

sailor 

CL BYES rte te cace etre ets a onctenar neh asset 

C2) gee NObexs catarkthia'a sc lave ‘atove: sits. tuecdeavere. weeies 
1l. Passage on poetry and 

the arts 

GE) EYES sc tue'efe enacts sere ata cae acon ees 

CZ) BNO pee rsttege tes tee Werestete ce mie ar reterte. 


Generally the items appear to 
test the student's ability in 
the following areas: 


Le Literal understanding 
of the passage 
1 a Identification of main 
idea or purpose 
LTT; Inference or implication 
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Post- 
Sec. 


TABLE A4.1 (Continued) 


Secondary Post- 
Sec. 
rade 13 Grade 12 


Questions 12-14 concern the 


issue of whether students 
should have these three 
abilities on entry to English 
or Anglais courses at these 
levels or before entry to 
post-secondary study. Select 
your answers to these questions 
from among the following 
options: 


(1) “All ‘the students 


(2) More than 75% of the 
students, but not all 


(3) 51% to 75% of students 
(4) 26% to 50% of students 


(5) One or more, but less 
than 26% 


(6) None 


12. How many of the students 
entering courses at this 
level should have the 
ability to read a passage 
for literal understanding? 


13. How many students should 
be able to identify the 
main idea or purpose of 
a passage? 


14. How many students should 
be able to draw inferences 


and see implications? 
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DOFWONH ADHFWNY Response Codes 


Nor wWwhnr re 


G 


TABLE A4.1 - (Continued) 


Post- 
Sec. 
Grade 13 Grade 12 


Questions 15-17 ask for an 
assessment of the amount of 
emphasis each of the three 
reading abilities receives 
in the courses of each type 
that you teach; Select 
your answer to these ques- 
tions from among the 
following options: 


(1) Heavy emphasis 
(2) Moderately heavy emphasis 
(3) Light emphasis 


(4) Individual, remedial 
emphasis only 


Response Codes 


(5) No emphasis 


15. What emphasis is given to 
reading for literal 
understanding? 


Cy ORIN 


16. What emphasis is given to 


reading to identify the 
main idea or purpose? 


a —_ won 


17. What emphasis is given to 
drawing inferences and 


seeing implications? 


oaFwWDhN 
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TABLE A4,1 (Continued) 


Secondary Post- 
Sec. 
Grade 13 Grade 12 
ates 
. 7) F 


WRITING EXERCISE 


Note: In responding to 
questions 20-23 inclusive, 
doinot reter in particular 
to the writing task assigned 
in this study. 


Anglais 


20. What importance do you 
place upon a sample of 
the student's writing 
in an evaluation of 
language competence? 


Cee SS CMa ae ek oie fetes ese ses 


(2) Important but not 
SS SCM Cl ied as rete Aeper crear sv atotie! whey ses 


(3) Of minimal importance 
ATIC wULCe late Vice mesretencnaronais cihcre ie eso. 


(4) Neither important nor 
WISCLUG Scale deters sce aie 6 eialaie stele wr e.e 


21. How many students should 
be able to produce an 
acceptable piece of 
writing of this type upon 
entry to courses at these 
levels or upon entry to 
college or university? 


GG i 7G Me Wey eet eetinin. Siena sd Cama anata ad 
(2) More than 75%, but 

TLO Chal Ds svete cu sla coats tee) ie tates veclotate. 
CSIR Dips EOS on crete tetinereteter syerete icles a6 
(4) 26-65 107-50 on ate dense = egerede tice 3.870) 
(5) One or more, but 

Lessithan) 206, aes wetets aimereye «eee 
CG) NOME foie ieocsten Ceuta ena lofel sels sie vos 


es | 


TABLE A4,1 (Continued) 


Secondary Post- 
Sec: 


Grade 13 Grade 12 


Are there important 
reading skills that 
have not been tested 
in the tests used 
but that should have 


been? 
CL PRAY ES cartes emer ree enc on ead a eee 
CZ SNORS atc han ncneett aac ee ante 


Is the multiple-choice 
format a reasonable 
method of assessing at 
least those three reading 
comprehension abilities 
mentioned above? 


rR PS 8g hoa, Greats Re ee Aire tea Se ns po a 
(2) XESo( Qua lT ELEC int. occ tress ete 
C5) TINO eects cn oie art reine: welts oie ae 
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TABLE A4,1 (Continued) 


Secondary Post- 
Sec. 


Grade 13 Grade 12 
or . . 


How much emphasis do you 
give in your teaching to 
the development of student 
competence in this type of 
writing? 


Ceca y vempnasis sia. acideis cite aie 2 8 


(2) Moderately heavy 


CMPNASLS ace vas baals ce sie sone ares weak 
Co) emeb tol emphasas <8. me ctelscar sw usec 
ap NOmMENONaS Doe acento a Wien «the 


How many of the students 
who successfully complete 
courses at this level 
should be able to produce 
an acceptable piece of 
writing of this type? 


JUN) CI AME Bipqia trom Marae ee erase FOR ee) Sy ee 
(2) More than 75%, but 

REO Cok Des ede os vice, cielo vie se ea; aires teetatonereee 
EO lo COR Dot a5 vad olees Stet Tee st 
AEE Og ACO WSO ohne cele cits tecte creel antes 
(S) One or more, but 

SSSmtH aN WO Ger ees he eee ate ere 
COV SPNON Gif: forctsa:g te ercteraee er ecsielecelaiereicrese « 


2p2 


TABLE A4,1 (Continued) 


Secondary 


Sec. 


Grade 13 Grade 12 


Note: Answer questions 24-26 


inclusive with reference to 
the specific writing assignment 
given. 


24. Was the assignment at a 
reasonable level of 
difficulty for students 
in courses at this level 
or upon entry to college 
or university? 


(1) tp VEST RAO MEE ees i a bakes eens 


25. Was=the restriction: to 2 
single topic unfair to 
students in a test of 
writing competence? 


(1) AYES nase ek oes Sev cnidons GOS phe earns 


26. Rank, in order of importance, 
the following criteria for 
evaluating this type of 
writing. Using the numbers 
1a) Opes a -Hieha 5 LOW) 
write the ranks in the five 
spaces provided in each column. 


Response Codes 


1. Organization 


OI 00 Ne 


2. Logic, use of evidence 


oF WN & 
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TABLE A4.1 (Continued) 


sterTsuy 


yst[ sug 


Grade 12 


pr 
ise} 
ue) 
= 
° 
12) 
o 
icp) 


ste[suy 


yst [sug 


Grade 13 


(e8ed snotaord 905) 
soepo) esuodsoy 


Style (chiefly the 


sentence) 


a 


Grammar, usage, mechanics 


4. 


Diction 


Ds 
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TABLE A4.1 (Continued) 


OTHER QUESTIONS 


20s 


ZO 


ZO 


Do you think a test in 
multiple-choice format 

1S in generalai satis- 
factory means of assessing 
students' competence in 
such areas as style, 
grammar, sentence structure 
and vocabulary? 


CLSOR YESS Os TERR AVES oe ee 
C2 Y BS (Qual afived) £8. ache che shee ene 
C3). NOR F Usts oth se Sees tA SRA See 


Where multiple-choice tests 
are used, should they be 
supplemented by other 


measures? 
(CL) APY ESPN: otto c cho otoMtehs cicheccBotece ates 
(2) TINO Rea ota tes c20 84 hee See se 


How do you regard the use of 
both a multiple-choice test 

of language achievement and 

a sample of writing in 
assessing language competence? 


(1) The multiple-choice test 
is satisfactory by itself 


(2)* The use of both is 


LMP OL COIN. - ara wie oie fe hot eece was 


(3) The writing sample is 


Satisfactory byp1tselt.. s+. 


(4) Neither is particularly 


Satis RACCOLY Ts toe ees ors Siete so 


ZO? 


Secondary Post- 
Sec. 
Grade 13 Grade 12 
es 
wn . 


TABLE A4.1 (Continued) 


Secondary Post- 
Sec. 
Grade 13 | Grade 12 


If a student's relative 
standing on a writing test 
were different from the 
student's relative stand- 

ing on a multiple-choice 

test of language achievement, 
which would you consider the 
most valid measure of the 
student's language competence? 


(1) the score on the writing 
(2) the score on a multiple- 
CHOUCOSCOS Bris Gere eerie ewes ares 
(3) a combined score weighted 
in favour of the writing 
(EMSs og bo odo OmO Na COG GOU UO oaDdn 
(4) a combined score weighted 
in favour of the multiple- 


GNOlCS BOSEs dod soaedogmopodd08S 


(S) a combined score giving 
equal weight to both tests..... 
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TABEE AG 22 


Percentage of Students at Each of Two Grade Levels 
Who Correctly Answered and Who Omitted Questions in 
the Reading Comprehension Part of the Test de 


connarssance, de la langue (anglais) 


Grade 12 Grade 13 
Item Percent Percent Percent Percent 
No. Correct Omitting Correct Omitting 
M| 82 Z 2.6) 0 
Z 56 0 61 3 
3 ED: y vee 4 
4 78 Z 84 7 
> 81 0 25 0 
6 63 0 83 0 
i 25 8 28 , 
8 69 it 80 Z 
9° E a ai = 
10 oa! 2 65 1 
ua 85 0 89 Mt 
Migs DD) 0 74 iy 
UNS) 97 0 98 0 
14 65 il 13 it 
ls) SY) ] 67 1 
16 74 0 80 0 
ey 55 Z 66 ss 
18 59 i 70 i 
Ale) G7 0 94 ] 
20 61 6 78 6 
Mean 66 76 
Sy 16 nS 


“Results are not reported for this item inasmuch as it was 
judged sto havelttwo correct vanswers. 
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TABLE A4.3 


Frequency Distributions of Difficulty Indices 
for Items in the Reading Comprehension Part of the 


Test de connaissance de la lanque (anglais) 


Percentage 


Correct Grade 12 Grade 13 
90-99 { 4 
80-89 4 5 
70-79 2 5 
60-69 4 4 
Doe: 7 ~ 
40-49 = = 
30-39 = - 
2029 

n bibs J ge 


265 


TABLE A4.4 


Discrimination Indices for the [tens eine tine 
Reading Comprehension Part of the Test de 


connaissance de ‘la langue (anglais) 
(N = 470) 


DUST WME 1 oy 


Item Index 
i} 47 

2 45 

3 62 

4 oye) 

5 68 

6 62 

vi a9. 

8 56 

9 25 
10 Bp) 
ti 54 
12 49 
13 46 
14 6 3 
15 50 
16 58 
Ley, 61 
ine) 56 
hg 52 
20 ou 
Mean Do 
5.0% 6 


Note: Decimal points have been omitted. 


“The biserial correlation between item scores and total 
test scores, uncorrected for the inclusion of the item in 
the. test, as, the,andee« of nhemadascrimination. 

OTe item was declared faulty in that it had two correct 
answers. It was excluded from subsequent analysis. 


TABLE A4.5 


Distribution Statistics and Information on the 
Reliability and Standard Error of Measurement 
of the Reading Comprehension Part of the 


Test de connaissance de la langue (anglais) 


Number of Examinees 470 
Number of Test Items® Ly 
Mean, BLD) 
Ss Ael7 
Highest Score Po 
Lowest Scor =25 02 
Reliability d On 71. 
Standard Error of Measurement Darel 


8The number of items in the test was 20 but one was 
declared faulty because it had two answers and was 
omitted. 

This etatustic is for the distribution of corrected-tor- 

guessing scores before any adjustment was made for the 

effect of absenteeism. 


Cc : : : 
Negative scores arise because a Correction, for, quessing 
was applied. Wrong answers were Scorned =) /ey, andes. corpect 
answers 1. Omitted questions were ignored in scoring. 


de stimated from a formula due to Hoyt (1941). 
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TABLE A4.6 


Data on the Reliability With Which the 
Writing Exercise Was Scored 


Estimates of Reliability 


Marker Summary Score Commentary Score 
1 wz Al 5 Hil 
2 61 .88 
5; 49 85 
4 Ds ow 
5 83 .o0 
6 Wd a0 
Mean 5 DS 80 
52D . 210) 10 


APPENDIX A5 


TECHNICAL REPORT ON THE PHYSICS ACHIEVEMENT TEST 


(This Appendix was prepared by John Hattie, a Graduate Student at 


the Ontario Institute for Studies in Education.) 


1.) DESCRIPTION OF NEST! CONTENT 


The test under consideration in this report is one form of the 
1970 version of the Ontario Physics Achievement Test. This test 
was part of a battery called the Ontario Tests for Admission to 


College and University. 


The Physics Achievement Test is based on the Grade Thirteen 
curriculum formulated by the Ontario Department “of Education 
(Curriculum S.17C, 1967). The main theme in this curriculum is 
the wave-particle duality of radiation matter. The curriculum is 
divided into four units. Only the first three units are assessed 


by® the? test. 


Unit I is entitled "Time, Space, and Motion", and is a 
general introduction to the concepts of time and space. It 
focusses on fundamentals of measurement, functions, motion along 
a straight line path, and motion in space. Unit II uses the study 
of light as a vehicle to show how physical models or theories are 
developed. It is called "Optics and Waves", and concerns the 
behaviour of light, the particle model of light, waves propagated 
in one and two dimensions, interference, and the light and wave 
model’. Unit III’ “is called’ Mechanics" wand deals with the 


dynamics of particles. Specifically, the topics covered are the 


law of inertia and Newton's third law, motion in the earth's 
gravitational field, the universal law of gravitation, the solar 
system, momentum and the conservation of momentum, work and 


kinetic energy, and potential energy. 


The. Physics Achievement Test consists of 18 items on 
material) jin Unit’ ii CitemsiPi=9 ,2631=39),00notitemsy tory Unite i 


(items 10-19, 40-48), and 23 items for Unit III (items 20-30, 
49-60). The topics covered are: Graphical solutions (14 items: 
Lpetsige Gotebaerlloy, Al sie28 ane baw oe doe SOM Ga 
Ratio and ProportionncdlOmitems: 6236055 oh); 012535 403! ,01535nm oe 
39, G25, 452)3 ‘Abstractions ‘(ly Weitemse: 04,827 42> ee Ue 
39, G46, 52° 50, 59): Linear «Kinematics Chl “bens so) bana geo 
6 10g" 215. 22; <35, °° S37 “95, 54-2 Non-linear) Kinematicsomics 
items:.ichy 68. 09 3 N24 ReZD Hh32 ye SOR Oy aDynanics Wioiittenssace., 
25.126 27 ELBURN ea eae mime ome: (Sil) (5 2e NSS mmcae 
STy 585) 594 60) suVectors mG ip titemskieo ee fees } aOh2 Seer an26) 
HiGhEE(9 Jatemdesehs idle Maer ROM ad Ga adiay 40) and 
Waves i (Sititems cbSt Woenthe, ONS ,bdinet oui 45). 


The abilities tested by the items in this instrument are as 


follows: 


(a) Ability to demonstrate an understanding of _ basic 


scientific jconcepts and, principles. 
Students are required to demonstrate their understanding 


of facts and their ability to reason with them, rather 


than’ their ability merely to.,recognize facts. 


we 


Ability to apply scientific concepts and principles 
To respond correctly to items measuring this ability, 


(b 


students must not only understand a principle, but they 
must also recognize how to apply it to many different 
situations, both in the physical world and in the 
laboratory. .sThey* ware: required) pto;, deduce a,specific 


inferences from broad generalizations and concepts. 


(c) Ability to handle quantitative relationships. 
This concerns reasoning in quantitative terms. That is, 


students must be able to understand the "how much" or 
the "how little" involved in scientific problems and be 
able to work out the numerical relationships between 


quantities. 


(d) Ability to interpret cause-and-effect relationships 
Students are asked to select the probable causes for 
happenings in nature from a list of principles and 
concepts related in other ways to the happenings. In 
addition, they are asked to predict the effect that 


altering one or more variables may have on a system. 


(e) Ability to apply laboratory procedures and interpret 
experimental data. 


It is expected that students who take this test will 
have had laboratory experience of a sort that 
familiarizes them with methods of formulating and 
attacking experimental problems. Since physics is an 
experimental science, students who have studied it 
should be familiar with the types of experimental 
results commonly obtained in physics and should know 


how to interpret such results. 
Below are two examples of the items contained in the test: 
1.. Two ‘sources, Sy and) 199, separated by distance d, 
illuminate a screen a distance L away. Which one of 
the following is a necessary condition for an 
observable stat ionary interference patterns to be 
produced on the screen? 


(A) Radiations from Sl and S2 differ in wavelength. 


(B) The frequency of source Sl is very close to but 


not equal to the frequency of the source SyAP 


fe pat 


(C) The distance d is large compared with the distance 
ts 


(D) One source is farther from the screen than the 


other source. 


(E) There is a constant phase difference between source 


Sl Zand source’ SZ". 


Answer: To answer this item it is necessary to realize 
that, assuming the frequencies of the sources Sl and 
S72" tow bemtidenticals Vtheretmiuislh belay fix edi opiace 
relation between the sources in order to produce a 
stationary interference pattern. Hence the answer is 


cibe 


A steel ball of mass 0.1 kilogram is fired due west at 
a speed of 90 meters per second. After collision with 
another steel ball it is moving at 120 meters per 
second due south. The magnitude of the change of its 


momentum is: 


(A) 21 kg-m/sec 
CBr kg-m/sec2 
(C) 3 kg-m/sec 
(D) 15 kg-m/sec 
(E) 15 kg-m/sec2 


Answer: To answer this question correctly it is 
necessary to find the difference between the initial or 
final vectors with the aid of Pythagorean relation. The 
correct units for momentum must also be considered. 
The answer is statement (D). The test was translated 
from senglash™ igto Frenchsftor admirpismrat leh vito 


Francophone students. 


Dilee 


2 


TEST APPRAISAL 


Test Appraisal Inventories were sent to the teachers of Grade 
Thirteen physics in the 67 secondary schools, both Anglophone and 
Francophone, in the study. Teachers in 62 schools responded. 
Sixty-two English language, and 11 French language teachers 
responded. University professors teaching first year courses May sa 
Ontario universities were also sent the inventory. Fifteen 
English language and 6 French language professors completed it. 
The secondary School “teachers constitute a reasonably 
representative sample of physics teachers in the ~provinee. . Lhe 
university professors however, are not a probability sample of 
university professors since names were selected quite arbitrarily 
and a number of those who were asked to respond did not reply. It 
must also be remembered that as the number of postsecondary 
teachers is small, the precision of results for this group is not 
high. 


2.1 Coded Responses _to the Appraisal Inventory 

The Test Appraisal Inventory aimed at judging the kind of 
knowledge required to answer the questions, and at assessing 
whether the content of the items concerned knowledge that students 
know or should know. The secondary school teachers were asked to 


consider each item in terms of: 


A. Old knowledge that students should have on entry to =the 


course: 
Al. This knowledge is not reviewed in the course. 
A2. This knowledge is reviewed in the course. 


B. New knowledge that all students are expected to learn 


in the course. 


Zio 


De 


The 
Following 


The 


New knowledge that some students are expected to learn 


in the course: 


Cl. Only 1 per cent to 25 per cent of students should 


llearn thas - 


C2em0nlyn 26ieperucentyptoteS0y pernpcent-) chersbudents 


should learn this. 


63,.0nlyr  -Sdpeoper eent: (toy (7braperas Cente Che Students 


should learn this. 


C4. More than 75 per cent but not all students should 


learn this. 


New knowledge that no student is expected to learn. 


university professors assessed each item in terms of the 


categories: 


Old knowledge that students should have on entry to the 


COUTSE: 


Al. This knowledge is not reviewed in the course. 


A2. This knowledge is reviewed in the course. 


New knowledge that all students are expected to learn 


in the course. 


Other. 


instructions for the Test Appraisal Inventory received 


some comment. One teacher had difficulty distinguishing between 


know ledge 


that is reviewed and new knowledge that all students 


are expected to learn in the course. One teacher asked whether or 


not new knowledge that some students will be taught was the same 
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as new knowledge some students will learn. Another queried 
whether emphasis was on material taught in enough depth so that 
the student could attempt the question, or on an estimate of what 
the student would actually do with the question in the test 
situation. One teacher asked whether the questions were to be 
classified according to what he hoped to teach if he had time, or 
according to what he actually had taught. Finally, one teacher 
noted that if he taught the content, he would expect 95 per cent 


tov learn that’ material. 


The frequency tabulations for each category across each item 
are presented in Table A5.1. In an attempt to discover which 
items were thought to test knowledge that students at _ the 
interface should have, the frequency of secondary = school 
appraisals falling in categories Al, AZ, B, and C4 were 
accumulated for each item. (These reflect judgments that 75 per 
cent or more of students at the end of Grade Thirteen should have 
acquired the knowledge to answer the item correctly.) Also, the 
frequency of university appraisals falling in categories Al and 
A2 were accumulated. (These reflect judgments that students who 
enter a first year university course in physics should already 
have acquired the knowledge tested by the item.) These 
accumulations are reported in Table A5.1 as Total ea thowal 2 


constitutes the sum of the remaining appraisal categories. 


The mean Total 1 percentage for English language secondary 
school teachers was 90.30 (standard deviation = 5.91), for 
French language secondary school teachers 83.46 (s.d. = 1.89), 
a Ie 
and for French language university professors 60.83 
(elya peerage | 


for English language university professors 56.89 Cards 


For the English language secondary school teachers, all but 
three items were judged to demand knowledge that more than 1D 
per cent of the appraisers said that 75 per cent or more of 
students at the interface should have. The three items not 
included were item 27 which concerns dynamics and a graphical 


solution; - item 38 which concerns ratios and _ proportions, 


fod pe, 


abstractions, and non-linear kinematics; and item 39 which 
concerns non-linear kinematics and ratios and proportions. For the 
French language teachers there were 11 items that were classified 
as requiring knowledge that less than 75 per cent of appraisers 
said that 75 per cent or more of the students should have. Of 
these 1l items, 5 concerned Unit I material, 3 concerned Unit I1 


material, and 3 concerned Unit III material. 


For the university professors, a very low proportion of 
items met the criterion of testing knowledge that 75 per cent or 
more of the appraisers indicated students should have on entry to 
university. There were only 14 items which more than 75 per cent 
of the English language professors said tested knowledge the 
students should have on entry to a university course. Of the 46 
items not included, 7 concerned Unit I material, 18 concerned 
Unit II material, and the other 21 concerned Unit III material. 
For the French language professors, there were only 1/7 items 
which more than 75 per cent claimed tested old knowledge. Of the 
43 items not included, 7 concerned Unit I material, 18 concerned 


Unit II material, and 18 concerned Unit III material. 


These results from the appraisal of the Physics Achievement 
Test/Test de rendement en physique suggest the conclusion that 
most secondary school teachers regarded the test as appropriate 
for students at the interface. University instructors, on the 
other hand, tended to view the test more negatively, most finding 
relatively few items that would test knowledge expected of 


entering first year students. 


2,2 Written Responses to the Appraisal Inventory 


There was an opportunity on the appraisal form for written 
comments about the test. Among the secondary school teachers 
there were three appraisers who responded that the physics test 
was suitable for Grade Thirteen students and that the questions 
were well written. There was, however, at least one expression 


of concern “over ‘thes factethat materialoun ethe stest could @notmbe 
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covered for each of the following reasons: shortage of time--a 
result of the Toronto teachers' strike; semestering, which put 
spring semester students behind students in schools not on a 
semester schedule; and insufficient time, due to the fact that 
fewer hours are assigned to the Physical Sciences Study Committee 
course. here’ in Ontario» than= in’ the United States, where it 
originated. Further, one teacher commented that when the goals of 
the course were set a few years ago, only the top quarter of the 
students studied physics, whereas more than one-half of Grade 
Thirteen students now take the same course. Because of this, the 
ability range of students is wider than it formerly was; hence 
the teaching techniques required to cope with students nowadays 
are quite different from the techniques required in the past. 
These reasons could account for a number of teachers claiming 
that some topics covered in the test were used as enrichment 


material only. 


Seven secondary school teachers expressed concern about the 
conditions under which the test was administered. They claimed 
that the students were not adequately forewarned about the test 
and there was little incentive for students to perform as well as 
they could. 


For many of the above reasons it was noted by two teachers 
that test results should not be compared with other tests 


administered on earlier occasions. 


One teacher commented that an all or nothing marking system 
disadvantaged the students, particularly as a review was not 
undertaken. His argument was that many students might remember 
the process but not the exact formula. Although these individuals 
should be able to eliminate some incorrect responses, if they 
chose the wrong answer they would receive no credit. (It should 
be pointed out in response to this comment that students writing 
the physics test were told that guessing would be penalized; the 
application of a correction for guessing has the. effect of 


crediting partial knowledge. ) 


vas i 


A pot-pourri of additional general comments were made by 
secondary school teachers. Notation was mentioned--units in the 
test were not Systeme Internationale, and teachers used d, not s, 
for distance. It was claimed that the ability to memorize 
formulae and the topic of waves received too much emphasis in the 
test. Some teachers felt that some topics which should have been 
tested were not; topics mentioned included qualitative 
electromagnetism; modern models of light charge and matter; units 
of measure and scientific notation; electrostatics; nuclear and 
atomic physics; vectors or components; waves; particle theory of 
light and the nature of light. Other appraisers noted that there 
were’ few items on electricity and atomic structure. (It) is 
obvious that these teachers failed to remember that Unit IV of 
the Grade Thirteen curriculum was not intended to be assessed. 
Unit WIV is wentibledt “Blectricity. randevAtomichestructtre iand 
covers electromagnetism, electrostatics, and nuclear and atomic 


physics.) 


The secondary school teachers made other comments about 
specific items. The most frequent comment was that a particular 
item was "not covered" in the course. This comment was made 111 
times in all and there were only 15 questions that did not 
receive this comment at least once. Seventeen items were 
considered to be enrichment material by one or two teachers. Five 
questions were regarded by up to three teachers as_ testing 
mathematics ,+.not.. physics, Gitemsyade e121 sees 1 fandits3)), ae wo 
teachers claimed students could have answered items 12 and 13 
correctly if there had been time to review formulae. Four 
teachers each found an item that had, in their opinion, incorrect 
answers or insufficient information (items 14, 27, 35, and 4O). 
(This contention is, of eoardes disputed by the experts’ who 
constructed the test and, presumably, also by the vast majority 
of appraisers who did not make a similar comment.) There was a 
complaint about the fact that three items could only be answered 
correctly if previous questions had been answered correctly (item 
16 depended on 15; 35 and 36 depended on 34; and 39 depended on 
5S). 


Zs 


Some items elicited comments reflecting differences in 
teaching styles. For example, one teacher saw the wording of 
item 19 concerning waves as being ambiguous, while another was 
concerned that the details necessary for answering the question 
were soon forgutten; another claimed that numbers should have 
been used, and yet another claimed that the use of graphs was 
misleading. The use of graphs in five questions (items: 719),°827 5 
28, 30, and 35) received negative comments from five teachers, 
and 20 teachers each found one item that required a degree of 
abstraction deemed to be beyond the Grade Thirteen student. One 
teacher claimed that the picture accompanying item 46. on 
wavelengths was ambiguous and another teacher contended that item 
48 on waves required awareness and recall of an actual 


experiment. 


These comments by the secondary school teachers centred 
primarily around the lack of time for review and preparation for 
this test and lack of time to teach the curriculum. The other 


comments concerned specific details of the test. 


The university professors also made commendatory and 
critical comments. The type of physics course each professor 
taught seemed to influence his comment. One said, for example, 
that the test was good for students taking physics for 
engineering, while another, who taught physics for music, claimed 
the test was inappropriate. Another noted that much of the 
material that concerned first principles was automatically 
reviewed when a higher level was being taught. It was reported by 
one professor that questions given to university students tended 


to be slightly more quantitative than those on the test. 


The number of comments from the university professors on 
specific items was small. Some concerned items that were seen by 
the professors as pertaining to material not covered in a first 
year course. Altogether, there were 20 occasions when the 
comment "not covered in the course" was placed by an item. 
Specifically mentioned as _ not covered were items relating to 


optics, variable forces, relative acceleration, nodal lines and 
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wave motion. Two items, (number 42 and number 50) were 
identified--each by a different single appraiser--as not having a 


correct answer. 


Overall, the appraisers' comments reflected a concern about 


material not covered in their courses. 


TECHNEGAL ISSUES 


oH. Veseoring 


Each item ‘was ‘scored’ 1) “if “corréct pe-Us225e1f incopmect yaand 0 
if omitted. Prior “to taking the test, students were /told that 
when the test was scored, a percentage of the wrong answers would 
be subtracted from the number of right answers as a correction 
for haphazard guessing. However, if they had sufficient knowledge 
of the problem to eliminate one or more choices as definitely 
wrong, they were informed that it would be to their advantage to 


guess among the remaining choices. 


Rize Dat Tacul Cy 


The mean uncorrected test score for the 641 English students who 
wrote the test was 19.56, with a standard deviation of 8.90. 
The range of scores was from 2 to 53. For the 82 Francophone 
students, the mean uncorrected test score was 14.13, with a 


standard deviation of 6.81, anda range Prom; Os to: saul. 


An index of difficulty was estimated for each item in the 
test as follows. The percentage of students answering an item 
correctly in a given school was obtained. This percentage was 
then averaged over all the Anglophone (Francophone) schools to 
obtain the Anglophone (Francophone) index of difficulty for an 


item. 


The mean difficulty indices for the items in various 
subgroups are reported in Table A5.2. The overall mean difficulty 
index was 33 (standard deviation = 16) for the Anglophone 
students. For the Francophone students the mean was 24 (standard 
deviation = 16). Both the low mean score on this test and the 
low mean index of item difficulty point to the fact that the test 
was very difficult for the students who wrote ititenlin tests 
designed to spread students over the possible range of scores, it 
is common to strive for an average -item difficulty at the point 
midway between perfect performance for everyone on an item and 
performance at a level that could be expected from pure guessing. 
For a test composed of five option multiple-choice items, this 
ideal level of difficulty is approximately 60 per cent of correct 
answers. The Physics Achievement Test is obviously very much more 
difficult than this. 


3.5 Appraisers Expectations and Difficulty 


To assess whether the appraisers’ judgements about the items were 
related to item difficulty, a coefficient of correlation was 
computed between the two variables. These correlations for each 
set of appraisers are presented in Table A5.3. The estimates of 
difficulty that were used to compute the coefficients for 
Anglophone appraisers were from Anglophone students; the 
difficulty estimates used to compute’ the coefficients for 
Francophone appraisers were from francophone students. The 
relationship between the appraisers' comments and item difficulty 
is similar, and moderately high; correlations of this size 
reflect the fact that 10 per cent of the variance in item 


difficulty can be predicted from the appraisers’ judgments. 


3.4 Discrimination 
A measure of the extent to which an item discriminates is the 


biserial correlation coefficient between scores on an item and 


scores on. the total test. In tests designed to spread students 
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over the range of scores, it is considered desirable to have 
biserial correlation coefficients of 0.3 or higher. The biserial 
correlation coefficients were computed for the items in this 
test. The indices were computed separately for Anglophone and 
Francophone students. A frequency distribution of these 
coefficients is reported in Table A5.4. Seven items in the test 
for English students, and 18 items in the test for French 
students had substandard indices of discrimination. On the whole, 
however, the items discriminated at an acceptably high level; the 
mean discrimination index for English students was 0.45 
(S$adaiy) 9710. 139)) sande for ofrenchs students «ithe mean -wasi40042 
(S:. Gemte nD Sie 


oso Reliability 


The Hoyt estimate of reliability for uncorrected scores for 
Anglophone students was 0.87, with a standard error of 
measurement of 3.21, and for Francophone students was 0.81 a 
standard error of measurement of 2.93. (See Hoyt, 1941 for the 
formulae used to compute this coefficient of reliability and 
standard error of measurement.) These results compare favourably 
with the KR(20) estimate of the reliability of uncorrected scores 
(0.86) and the associated standard error of measurement (3.3) 


reported in the 1970 test manual. 


3.6 Speededness 


Using criteria formulated by Educational Testing Service, a test 
is considered unspeeded if at least 80 per cent of those writing 
reach the last item, and all candidates reach the three-quarter 
mark. For this administration of the Physics Achievement Test/ 
Test de rendement _en_ physique, statistics on the. percentage of 
students not reaching a given item are presented in Table A5.5. 
For the English students, 3 per cent of the students did not 
reach the three-quarter mark and 37 per cent did not reach the 


last item. For the French students, 4 per cent did not reach the 


three-quarter mark, and 59 per cent did not reach the last item. 
Hence the test was somewhat speeded. It should be noted that this 
test was very difficult for most candidates and it may be that 


the items near the end of the test were reached but = not 
attempted. 
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LAB E AS 32 


Mean®Percentiage of Students Responding Correctly tosthe 
Items in Each of Several Categories of the 


Physics Achievement lest / Vest de rendement en physique 


ype No. 2 ofsaltems English Fane ncn 
Unisel 18 38 30 
Urata ee 19 Via) 2D 
Unite tT vae} jel 18 
Graphs 14 By ae 
Ratio and Proportion 10 Bil 24 
Abstractions seu Zz Ne, 
Linear Kinematics dias oft Die 
Non-Linear Kinematics 8 Sirs: 34 
Dynamics 1) 30 20 
Vectors 7 49 42 
akalalae 9 Z> Zul 
Waves 8 32 vad) 
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APPENDIX A6 
HECHNT CAL REP ORG 


MATHEMATICS ACHIEVEMENT TEST 


The Mathematics Achievement Test was administered to 1,404 students 
potentially eligible in June 1976 for a Secondary School Honour 
Graduation Diploma (SSHGD). These students were drawn from 52 of 
the 53 Anglophone schools involved in the study (the remaining school 
contains no SSHGD-level students). The population tested included 
students enrolled during the school year 1975-1976 in both Calculus 


and Relations and Functions. 


The test had originally been administered as Part I of the 
Ontario Mathematics Achievement Test, Form OB, in May 1968. It 
consisted of 36 multiple-choice 5-option items; the time allowed was 
pUminutes.. the description of sthe@tests contentaawhich follows is 
excerpted from the Technical Report on the original administration, 


written by Dorothy M. Horn and Wilfred G. Futcher: 


The items were originally written by Grade 13 mathematics 
teachers in the Province of Ontario and by members of the Ontario 
Mathematics Achievement Test Committee. This committee reviewed all 
items, deleting some and revising others. They selected items for 
tryout tests, given in the Fall, 1967, and later made the selection 
of items for the test, Form OB. 


The items tested were distributed among the following three 


categories: 


I. Grade 13, Analysis 
A. FURetion as a mapping 
B. Second degree relations in the plane 
C. Trigonometry 
D. Transformations in the plane 
E. Slopes and simple derivatives 
F. Applications of differentiation 


II. Basic principles from earlier grades related to topics 


such as: 
A. Linear and quadratic functions 
B. Exponents and logarithms 
G.. n€ircies 
D. Sequences and series 
III. Miscellaneous (items which cannot be identified clearly 
with a particular grade level or a particular topic in 
the Courses of Study for Grades 10, 11, 12 or 13). 


The taxonomy classifications were: 


A. Knowledge and information: definitions, notations, 


concepts; 


B. Techniques and skills: solutions; 


C. Translation of data into symbols or schema and vice 


versa; 


D. Comprehension: capacity to analyze problems and to 


follow reasoning; 
E. Inventiveness: reasoning creatively in mathematics. 


Some further comments should be made concerning the test 
content. The test was, of course, developed at a time when 
Senior Division Mathematics courses had a structure quite 
different from the present one. Consequently, the fit between 
test content and curriculum content on this administration was 
considerably less good than had been the case in 1968. The test 
provides a comprehensive coverage of Relations and Functions, and 
of parts of Calculus, but has no items dealing with such areas as 
solution of differential equations, polar co-ordinates, and 
complex numbers. There is no coverage of Algebra. Since it proved 
impracticable to split the test into two subtests dealing 
respectively with Relations and Functions and Calculus, so that 
students in these courses could be tested separately, the decision 
was taken to administer the test only to students enrolled in 


both these courses. 


DIERITCUL TY 


Table A6.1 presents difficulty indices for all test items for 
both the original test administration (1968) and the present 


administration. 


Considerable caution should be observed in drawing any 
conclusions regarding the relative difficulty of an item on the 
two administrations. First, the difficulties are calculated in 
somewhat different ways. The 1968 figure is the proportion of 
students writing the test who chose the correct response to a 


particular item. The 1976 figure is calculated as follows: 
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Within each school, the proportion of students writing the test 
who chose the correct response was calculated. These indices were 


then averaged over schools to give the corrected difficulty. 


Second, as had been mentioned above, there has_ been 
considerable change in curriculum over the eight years since the 
original administration. This cannot help but affect the response 
pattern to particular items, and it would be unreasonable to draw 
conclusions about the relative level of achievement of current 


students without considering this factor. 


Third, the two administrations involved somewhat different 
populations. Those writing the test in 1968 were students who 
intended to attend university and were writing the e:testraane an 
effort to increase their chance of acceptance. Students writing in 
1976 were doing so whether or not they intended to proceed to 
university, and without the motivation provided in 1968, since 


their mark on the test would be of little or no use to them. 


Bearing all these caveats in mind, one still finds a 
striking similarity between the two sets of difficulty indices. 
The mean difficulty has dropped by only 0O.G2; the _ standard 
deviation has increased by the same amount. Of 36 items, there 
are only nine on which the absolute shift in difficulty equals or 
exceeds 0.10; of these nine, four proved substantially easier 
than in. 2968 \and’*five™’proved ‘substantially harder <9" ne 
coefficient of correlation between the two sets of figures is 
0.84. 


BISERIAL CORRELATION 
The mean biserial correlation coefficient, using uncorrected total 


test score as the criterion, was 0.46 with a standard deviation 


i) teal) sd ee 
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For the 1968 administration, using corrected total test 
score as the criterion, the mean was 0.46 with a_= standard 
deviation of 0.09. 


The distribution of biserial correlations for both 
administrations is given in Table A6.2. Again, caution should be 
observed in making direct comparisons because of the difference in 


Criceria 


RECLABLELTY 


The Hoyt estimate of reliability for uncorrected scores was 
0.79. On the original administration, the estimated reliability 
coefficient for corrected scores, found by using an adaptation of 


the KR(20) formula derived by P.L. Dressel, was 0.78. 


SPEEDEDNESS 


A test is considered unspeeded by one rule of thumb if: (a) at 
least 80 per cent of those writing reach the last item; and (b) 
all candidates reach the three-quarter mark. On this 
administration, statistics on the percentage of students not 
reaching a given item were computed in the same way as were 
difficulty indices--i.e., by averaging over schools. Detailed 
figures on item-by-item dropout are given in Table A6.3, along 
with the same data for the 1968 administration (not computed in 


this way). 


On both administrations the test was somewhat speeded, but 
less on the present administration. All students did reach the 
three-quarter mark, with 1 per cent dropout only at item 50. 
However, only 30 per cent reached the last item (though 86 per 
cent reached item 34). It should be pointed out that on the last 


items on. a test, the "not reached" figure is subject to 
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considerable distortion, since it is impossible to distinguish 
between those who did not reach the items and those who reached 
them but did not attempt them. In view of the 86 per cent figure 
at item 34, it seems reasonable to assume that somewhat more 
than 30 per centetdid) in» fact reach the! end, ) but. chose: noteta 
attempt the last item or two. There are no grounds for assuming, 
however, that this figure would approach the 80 per cent 


requisite for unspeededness. 


CONTENT VALIDITY 

As Svaeacheck fom content... validutyie. teachersiio tee alledlucets on 
Relations and Functions, and of various first year university 
Mathematics courses were asked to complete Test Appraisal 
Inventories related to the test. Secondary teachers were asked to 


classify each test item under one of the following headings: 


A. Old knowledge that students should have on entry to the 


course. 


Al. This knowledge is not reviewed in the course. 


A2. This knowledge is reviewed in the course. 


B. New knowledge that all students are expected to learn 


in the course. 


C. New knowledge that some students are expected to learn 


in the course. 


Cle tOnky ie!ton2>heo0f istudentsiisheuld Learnethis. 


C2. Only 26% to 50% of students should learn this. 


C3. Only 51% to 75% of students should learn this. 


C4. More than 75% but not all students should learn 
elu IG 


D. New knowledge that no student is expected to learn. 
The corresponding categories for university teachers were: 


A. Old knowledge that students should have on entry to the 


course. 
Al. This knowledge is not reviewed in the course. 
A2. This knowledge is reviewed in the course. 


B. New knowledge that all students are expected to learn 


in the course. 
Ce Other. 


Teachers classifying an item other than Al, A2 or B were 
asked to give a brief explanation of their classification 
(enrichment material, not relevant to course, etc.). Almost 
without exception, these comments were statements that the item 
content was not relevant to a particular course, or that it was 
enrichment material; therefore, the comments are not reported 


here in detail. 


Table A6.4 presents, for each item, the number of 
Relations and Functions teachers, the number of Calculus 
teachers, and the number of first year university Mathematics 
teachers choosing each classification for the item. The numbers 
of respondents at each level were: Relations and Functions 


teachers, 98; Calculus teachers, 7/4; university teachers, 41. 


Table A6é.. presents, forceach item, the  fiollonung 


information: 


(a) general description of the content. Items judged by the 
test selection committee to be a review of pre-SSGD 


work are described as "review". 


(b) assessment given the item by teachers of Relations and 
Functions, teachers of Calculus, and teachers of first 
year university Mathematics courses. Ihis assessment 
was arrived at in the case of secondary teachers by 
aggregating the number of respondents assessing the 
LECH posh Gine thiSpcorder) sA loo A eee D seme. 5 )e eee ela 
and D, until the aggregate number equalled or exceeded 
iD, Semecentsof those nesponding to, theajitemelniithne 
case of university teachers, the aggregation was done 
in, cheworders Ali AZ. (Boe Ccevendsuthe ;culofh, poanue was 
70\ per ycent’ to allow -.ifor: *thegreaty diversity sin) first 
year university courses. Thus, for example, the rating 
by Relations stand Funetionsisteachers) of a@temn™ ly asmo 
feans “that fewer, than | 75) (per wcene of “thesem teachers 
Paved jiipassal) OD AZ but De pelyscent. ObemUbCm=rabed 
LUnAl eA Orb 


(c) difficulty of the item (the average over schools of the 


proportion of students correctly answering the item). 


This table provides two criteria for determining content 
validity. First is the assessment by secondary teachers. of 
whether students should be able to answer a particular item on 
entry to or on exit from their SSHGD level courses. Of the 36 
items, there are eight for which teachers of both courses agree 
that the student should have this knowledge on entry to SSHGD 
level courses. There are 25 further items whose content is, 
according to teachers, taught in at least one of Calculus and 
Relations and Functions to all students. Of the remaining items, 
none meets the criterion of 75 per cent classification as Al, A2 


or Bi ‘by. “teachers of “ome: of — theseascourses.” Sitemy (l7 sire eico 
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classified by 71.1 per cent of Relations and Functions teachers; 
item 32 is so classified by 61.7 per cent of Calculus teachers; 
and item 36 is so classified by 60.3 per cent of Calculus 


teachers. 


The second criterion is the assessment by university 
teachers of whether the student should have met the item content 
before entry to a first year Mathematics course, or should learn 
it in that course. Of the 36 items, university teachers judged 
that a student should know the content of 19 on admission, and 
should learn the content of a further 12 during his/her first 
year course (indicating that this content is relevant to first 
year work in Mathematics). For the remaining five items, the 
percentages of first year teachers assessing them as Al, A2 or B 


were respectively 59.0, 51.3, 64.1, 48.8 and 69.3. 


Teachers were also asked to comment on the suitability of 
the test as a whole for assessing the achievement of students in 
their courses (at the secondary level) or entering their courses 


(at the university level). 


Several secondary teachers were disturbed by the combination 
of Relations and Functions questions with Calculus questions on 
the test, although others did not find this a problem. A number 
of Calculus teachers pointed out areas of omission--integral 
calculus, limits, polar graphs, length of curves; one of these 
teachers said that no more than 5 per cent of his/her course was 
covered on the test. One teacher gave a lengthy list of skills 
which he/she felt should have been tested, including operations on 
rational expressions, use of logarithms, ability to discriminate 
between. such expressions as Sire: xe saiven-.oanrers ( XyLpecand 
simplification of complicated expressions (both numerical and 
algebraic) using factoring. This teacher felt there should be more 
on trigonometry; one of these felt that the trigonometry that was 
there was on a Grade Eleven level. One said that the calculus 
questions in general tested pre-SSHGD knowledge. One teacher 
described the test as designed "to prove today's Gr. 13 knows no 


math" and frustrating for the students because the items “are 
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imaginative, inventive and contain multi-concepts is 


Another teacher felt that the existence of alternate methods for 
doing some items was a problem. The test was described by a few 
teachers as "designed for the sophisticated mathematician" and 
"requiring a high level of reasoning and abstraction", thus 
"testing .the ability of the student more than the teacher." A 
number of teachers described the test as "a good experience" or 
"fair". One teacher stated, “This material has been taught; the 
knowledge is there to solve problems if students can tie the 


ideas together." Two others made similar comments. 


The number of comments from university teachers was small. 
Some of these teachers taught courses for which SSHGD level 
Mathematics was not a prerequisite (computer science, elementary 
probability @theapyem etc). One steachers found) jthe. test Sidulik 
claiming there was "not one question that is at all interesting 
to students at any level whether they can do them or not." 
Another+«describeds thas» "Victorian:.. ) One efelt titricontained too 
much pre-SSHGD material. One felt it contained too much material 
for a 60-minute test, was repetitious, and should have contained 
coverage of linear algebra. One felt there was too much emphasis 
on transformations and mapping notation. One said there was very 
little on the test that should be “entirely new" at this level, 
but that students did not in fact have a working knowledge of the 
material. Two described it as a good test, but one felt that not 


all entering students knew the content. 


Indestnmany.aeo2 4of, the, ~25.titems “mest =the teritenroneor 
appropriate assessment by secondary teachers, with a further item 
very close to that criterion. 31 of the items are assumed by or 
tatight tbyetat- least 7.57 per cent of first year university teachers. 
Teachers' general comments, although somewhat critical of the 
actual content of the test (largely as too sophisticated) were 
more concerned with omissions--largely of many portions of the 


calculus course. 


TABLE A6.1 


Mathematics Achievement Test 
Corrected Item Difficulties 


DP POR 1758 D for Present Apparent 
Item Administration Administration Shwe t 
il Geez 0.80 -0.02 
2 OP, Ore -0.04 
3 0.83 eis ie +0.02 
4 0.40 Oey +0.02 
5 0.56 OS 7 +0.01 
6 0.06 0.80 +0.04 
7 Boeksbl O82 +0.01 
8 eee Oo o0 +0.06 
) OarZ O27 7 +0.05 
10 0.88 0.89 +0.01 
Dell 0.36 OR oz -0.04 
12 0.40 ON G62 +0.22 
13 Ou Oieae.2 +0.01 
14 Ohi BS Oye -0.07 
1 C6. 7. O62 -0.05 
16 Ue Ona 2 -0.04 
dy 0. 62 O67 +0.02 
18 aes U4 -0.13 
19 O16) O78 +0.17 
20 OS uk O26 (ions 
van 0.70 Oicy?, _ =O 0S 
Pod. Oo Of -0.01 
23 Gr) 0.45 0 14 
24 Oeer'6 O70 -0.06 
Zo Over zy Oe 2S -0.02 
26 SS) O29 -0.04 
iG O26 Oa +0.06 
28 Bhs Sie Ciee28 od Ue moe BI 
2? Oke) Dessau +0.12 
30 OE56 OF, oe -0.02 
5], Died! G..Ab6 —.59 
bis 0.61 Oe -0.36 
bi) Tigo! Creag: -0.02 
ey 0.40 0.45 +0.05 
by; Oe be Be 28) +0.04 
36 O07 OF U7 0.00 


"See text for explanation of the different methods used 
an calculating these figures for the two 
administrations. 


TABLE A6.2 


Mathematics Achievement Test 
Distribution of Biserial Cotrelations 


1968 Present 
Correlation Administration® Administration 
O'. (OMe sO. B9 0 0 
Oe eee, 0 0 
O20 te to 29 1 0 
O30 ee AO 359 8 A 
07407-9049 14 ay 
Ope eN El ec ORS wae) aa 2 
OF 6 RFE sDe62 2 3 
O.705- We 79 0 0 
OS 0 Ob 0 0 
0. 9.08 ts 89D 0 0 


8Note that in 1968 the criterion was corrected total test 
score; on the present administration, the criterion was 
uncorrected total score. 


TABLE A6.3 


Mathematics Achievement Test 
Distribution of Students for Whom the Given Item 
Wace thesharsc “Note Reached! 


1968 Administration® Present Administration 

~ OF Students Cumula- %” Of Students Cumula- 

Item Not Reaching tive % Not Reaching tives 
24 0 0 0 0 
22 i 1 0 0 
26 0 i 0 0 
"AG 0 1 0 0 
28 1 2 0 0 
Z9 0 2 0 0 
30 th 4 1 di 
esi 3 u 1 Zz 
iS) 0 i 0 J 2 
33 6 13 5 i 
34 8 ZEN 7 14 
53) y2>) 46 ja) 43 
oe ae 68 Ze 70 


a 


@Please see text for explanation of the different methods 
used in calculating these figures for the two 
administrations. 

Dror the last item, the "not reached" figure includes also 

those students who reached but did not attempt the item. 
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APPENDIX A7 
TECHNICAL REPORT 


TEST OF ARITHMETIC AND BASIC ALGEBRA 


The Test of Arithmetic and Basic Algebra was administered to 1,687 
students potentially eligible in June 1976 for a Secondary School 
Graduate Diploma (SSGD). These students were drawn from 52 of the 
53 Anglophone schools involved in the study. The remaining school 
was excluded because it contains only candidates for the Secondary 
School Honour Graduation Diploma (SSHGD). The population tested 
included students currently studying Mathematics in the Enriched, 
Advanced and General streams, as well as_ students not presently 


studying Mathematics. 


The test consisted of 35 multiple-choice 5 option items 
providing a broad coverage of basic arithmetic and algebraic skills. 
The items may be broadly categorized in terms of content as: basic 
arithmetic, 9 items; basic algebra, 17 items; exponents, 3 items; 


quadratic equations, 2 items; analytic geometry, 4 items. 


A second categorization of the items may be done on the basis 
of the grade level at which, according to the most recent Ministry 
of Education guidelines for Mathematics, the student should first 
have encountered the content of a particular item. This grade level 
differs for certain items, according to whether a student is enrolled 
in the General or Advanced stream in Mathematics. Table A/.1 
presents for each of these streams the number Ofeatens Mirst 


encountered at each grade level. 
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DIFFICULTY 


The difficulty D of an item is generally reported as_ the 
proportion of persons writing the test who chose the correct 
response to the item. In the case of this test, what is reported 
is an average difficulty for each item. The difficulty was 
calculated in the usual way for the students within each 
individual school. The resulting indices were then averaged over 


all schools. 
[hes Mmeanvatem difficulty -wac 07256) with) ea estanaand 


deviation of 0.16. The distribution of item difficulties is given 
in Vabbesay, 25 


BISERIAL CORRELATION 
The mean biserial correlation, using raw total test score as the 


criterion, was 0.63, with a standard deviation of 0.09. The 


distribution of biserial correlations is given in Table A7.3. 


RELTAB DEH, 


The Hoyt estimate of reliability for uncorrected scores was 
O98. 


SPEEDEDNESS 


A test igs considered unspeeded by one rule of thumb if: (a) at 
least 80 per cent of those writing reach the last item; and (b) 
all candidates reach the three-quarter mark. 80 per cent of the 
candidates on this test attempted the last item, and 99 per cent 


BLL 


reached the three-quarter mark. The test was thus’ virtually 


unspeeded. An item-by-item report on dropouts is given in Table 
A7.4. 


TESTS SCORES 
A correction for guessing was applied to test scores by 
subtracting from the number of correct responses one-quarter mark 


for each incorrect response. 


The mean corrected test score was 16.58, with a standard 
deviation Of 29.07". 


CONTENT VALIDITY 


As a check on content validity, teachers of Applications of 


Mathematics 2 (the General stream SSGD level course), Foundations 


of Mathematics 2 (the Advanced stream SSGD level” course), and 


various first year CAAT Mathematics courses completed Test 
Appraisal Inventories related to the test. The secondary teachers 
were asked to classify each item under one of the following 


headings: 


A. Old knowledge that students should have on entry to the 


COUTSE: 


Al. This knowledge is not reviewed in the course. 


A2. This knowledge is reviewea in the course. 


B. New knowledge that all students are expected to learn 


in the course. 


C. New knowledge that some students are expected to learn 


in .the -course. 

Cl. Only 1% to 25% of students should learn this. 
C2. Only 26% to 50% of students should learn this. 
C3. Only 51% to 75% of students should learn this. 


C4. More than 75% but not all students should learn 
this. 


D. New knowledge that no student is expected to learn. 
The corresponding categories for CAAT teachers were: 


A. Old knowledge that students should have on entry to the 


course: 
Al. This knowledge is not reviewed in the course. 
A2. This knowledge is reviewed in the course. 


B. New knowledge that all students are expected to learn 


in the course. 
Ge Whiner - 


Table A7.5 presents, for each item, the number of teachers 
at each level choosing each classification for the item. The 
numbers of respondents at each level were: Applications teachers, 
773; Foundations teachers, 93; CAAT teachers, 37. 


Teachers classifying an item other than Al, A2 or B were 
asked to give a brief explanation of their classification 
(enrichment material, not relevant to course, etc.). These 
explanations, where they occurred, were almost without exception 


statements that the material was not relevant to a particular 
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course (for example, 126 of 127 comments by CAAT teachers were 
of this nature). The comments are therefore not reported here in 
detail. “Lable,.A/.6*-presents,. for each, .item..on. the test,; the 


following information: 
(a) general description of the content. 


(b) statement of the grade level at which students in the 
General or Advanced stream would first be expected to 
encounter the item content, according to Ministry of 


Education guidelines. 


(c) assessment given the item by teachers of the 
Applications course, teachers of the Foundations 
course, and teachers of first year CAAT Mathematics 
courses. This assessment was arrived at by aggregating 
the number of respondents assessing the item as (in 
thiseorder) 441., AZ, iBa7 Cas) Coun 2ZeeGh, and.d.. until 
the aggregate number equalled or exceeded 75 per cent 
of those responding to the item. (In the case of CAAT 
teachers, the aggregation was done in the order Al, 
A 2.20885.) CED dal hase Por sxamples sche cating by 
Applications teachers of item 3 as A2 means that fewer 
than _75 per cent of these teachers rated the item as 
Al, but 75: percent or more, of them rated it,as Al or 
A2. 


(d) difficulty of the item (the average over schools of the 


proportion of students correctly answering the item). 


This table includes three different criteria for determining 
content validity. First is the level at which, according to 
Ministry guidelines, the student should first have been taught the 
content of an item. All test items meet this criterion, since 
their content is included in the guidelines either at or before 
the SSGD level. Second is the assessment by teachers of 
Applications and Foundations as to whether students should be able 


to answer a particular item on entry to or on exit from their 


ae) 


courses. Again the content of all items was judged to fall either 
at a pre-SSGD level (25 items in the General stream, 30 items 
in the Advanced stream) or at the SSGD level (all remaining 
items). Third is the assessment by CAAT teachers of their 
expectations of entering students and the content of their first 
year courses. These teachers expected the students to have 
encountered the content of 15 of the items before entering 
college, and taught the content of a further 17 (indicating that 
the content of these items is relevant to first year work in 
Mathematics). The remaining three items did not meet’ the 
criterion “of #5 per ‘cent of ~the*eGAAl “teachers “expecting mor 
teaching the content, but came close, with the aggregated totals 
at the B level being respectively 68 per cent, 65 per cent and 
(a lets Centum 


Teachers at both levels were also asked to make general 
comments on the suitability of the test as a whole. Ten secondary 
teachers stated that the test was not a good measure of SSGD 
level mathematical skills, since the content was almost all at a 
lower level. (This, of course, was to be expected, since the 
test was not designed as a measure of competence in SSGD level 
content, but as a test of basic skills.) Three other secondary 
teachers stated, on the other hand, that some or all of the test 
was too difficult for SSGD level students in the General stream, 
or for students not currently taking Mathematics. Iwo teachers 
stated that they thought the test was suitable--a "good sample 
for basic applications", “should be of average difficulty". One 
teacher said the test was not an accurate measure of achievement, 


but did not specify why. 


Eight CAAT teachers described the test as suitable in that 
all entering students should be able to handle all the test 
content. However, five of these teachers and seven other teachers 
said that in fact their incoming students do not have _ these 
skills. Three CAAT teachers deplored the omission of a question 
on ratio and proportion; two suggested that word problems should 


have been included; and one made a general statement that the 


326 


test was "too limited to evaluate the broad range of items that 


the students should bring with them." 


In general, then, the test appears to be a reasonably valid 
instrument in terms of content. The majority of criticisms at the 
secondary level were statements that the test was not what it was 
not intended to be--a measure of achievement skills taught at the 
SSGD level. CAAT teachers generally approved of the content, 
while pointing out two areas omitted which they felt to be of 


importance. 
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Grade: bev ed Number of Items 
7. 7 
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General Advanced 
stream stream 
2 10 16 
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Distribution of Average Item Difficulties 


Rate CROLL Y, 
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“For the last item, the "not reached" figure includes also 
those students who reached but did not attempt the item. 
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APPENDIX A8 


TECHNICAL REPORT ON: THE 


TEST DE RENDEMENT EN MATHEMATIQUES 


The Jest de rendement en _mathématiques was administered to 140 
students potentially eligible in June 1976 for a Secondary School 


Honour Graduation Diploma (SSHGD). These students were drawn from 
the Francophone population of the 14 Francophone and bilingual 
schools involved in the study (for a definition of the Francophone 
population in bilingual schools, see Chapter two of the report, Part 
A, subsection 1.1). The population tested included students enrolled 
during? tne ‘school “year” 1976-1976, im botn Calcul “and Relations et 


fonctions. 


ie ES. CONTENT 
The content of the test was identical, except for language, with 


that of the Mathematics Achievement Test. For a description of 


the content, see the Technical Report on that test (Appendix A6). 


Zé PECHNICAESTISSUES 


Z' 21 *DiFFreulcy 


The difficulty D of an item is generally reported as_ the 


proportion of persons writing the test who chose the correct 


response to the item. In the case of this test, what is reported 
is an average difficulty for each item. The difficulty was 
calculated in the usual way for the students within’ each 
individual school. The resulting indices were then averaged over 


all sehools:. 


The mean item difficulty was 0.44 with a standard 
deviation of 0.25. The distribution of item difficultiessis given 
in Table A®6i.d). 


2 fé Raigaciel  Comralaciop 


The mean biserial correlation, using uncorrected total test score 
as the criterion, was 0.49 with a standard deviation of 0.16. 


The distribution of biserial correlations is given in Table A8.2. 


2 37 Meliabi lity 


The Hoyt estimate of reliability for uncorrected scores was 
Un Ton 


2.4 Speededness 


A test is considered unspeeded by one rule of thumb, if: (a) at 
least 80 per cent of those writing reach the last item; and (b) 
all candidates reach the three-quarter mark. This test was 
somewhat speeded. Although 98 per cent of candidates reached the 
three-quarter mark, only 27 per cent reached the last item. It 
should be pointed out that on the last items on a test, the "not 
reached" figure is subject to considerable distortion, since it is 
impossible to distinguish between those who did not reach the 
items and those who reached them but did not attempt them. In 
view of the 84 per cent figure at item 34 (of 56 items), it 
seems reasonable to assume that somewhat more than 2/7 per cent 


did in fact reach the end, but chose not to attempt the last item 
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or two. There are no grounds for assuming, however, that this 
figure would approacinvatbes! Sirper*icentimerequias te. ‘for 


unspeededness. 


Table A8.3 presents an item-by-item account of dropouts. 


Dia lest scores 


A correction for guessing was applied: to test scores by 
subtracting from the number of correct responses one-quarter mark 


for each incorrect response. 


TEST APPRAISAL 


As a check on content validity, teachers of Calcul, of Relations 
et fonctions, and of various first year university mathematics 
Courses were asked to complete Test Appraisal Inventories 
(Test-inventaires' estimatifs) related to the. test. Secondary 
teachers were asked to classify each test item under one of the 
following headings (see the Technical Report on the Mathematics 


Achievement Test, Appendix A6, for the English translation): 


A. Connaissances antérieures que les étudiants devraient 


avoir au début du cours. 


Al. Ces connaissances ne_ sont pas revues dans le 


cours. 
A2. Ces connaissances sont revues dans le cours. 


B. Connaissances nouvelles que tous les étudiants doivent 


acquérir pendant le cours. 
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C. Connaissances nouvelles que quelques étudiants doivent 


acquérir pendant le cours. 


CileeRwe % a 25% des étudiants seulement doivent les 


aequerig 


C2. De 26% Aa 51% des étudiants seulement doivent les 


acquérir. 


C3. De 51% a 75% des étudiants seulement doivent les 


acquérir. 


C4. Plus de 75% des étudiants--mais pas tous--doivent 


les acquérir. 


D. Connaissances nouvelles qu'aucun étudiant ne doit 


acquérir pendant le cours. 


The corresponding categories for university teachers were: 


A. Connaissances antérieures que les étudiants devraient 


avoir au début du cours. 


Al. Ces connaissances me sont pas revues*'idans le 


COurs. 


A2. Ces connaissances sont revues dans le cours. 


8. Connaissances nouvelles que tous les étudiants doivent 


acquérir pendant le cours. 


C8 pAutire:: 


Table A8.4 presents, for each item, the number of teachers 
of Relations et fonctions and the number of teachers of Calcul 
choosing each classification for the item. The number of 
respondents was: teachers of Relations et fonctions, 12; teachers 


of Calcul, ll. There was only one university respondent. Clearly 
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no conclusions can be drawn from one response, but as a matter of 


interest this teacher's responses are indicated in the table by an 


asterisk. 


Table A8.5 presents, for each item, the following 


information: 


(a) A general description of the content. Items judged by 
the test selection committee to be a review of 


pre-SSHGD work are described as "review". 


(b) The assessment given the item by teachers of Relations 
et fonctions and by teachers of Calcul. This assessment 
was arrived at by aggregating the number of respondents 
assessing the item as (in this order) Al, A2, B, C4, 
Gor) E29 Cl. “and =D), “until ethe vaggregate, number 
equalled or exceeded 75 per cent of those responding to 
the item. Thus, for example, the rating by teachers of 
Relations = et | fonetions “of “item= 4° as, AZ means “that 
fewer Sula) Pel sCenun Or sunese sLeachers rated aitaas 
Al abut (>> *per scent, obs Imere irabed p10 cag ah oon AZ. 
(Because of the small number of respondents , this 
method of assigning an overall assessment is imprecise, 


and should be considered only a rough approximation.) 


(c) The difficulty of the item (the average over schools of 
the proportion of students correctly answering the 


item). 


This table provides a criterion (admittedly rough) for 
determining content validity. Of the 36 items, there are five for 
which teachers of both courses agree that the student should have 
the necessary knowledge on entry to SSHGD level courses. The 
content of one item is expected on entry by teachers of Relations 
and fonctions, although classified C2 by teachers of Calcul. 
There are 25 further items whose content is, according to 
teachers, taught in at least one of Calcul et Relations et 


fonctions to all students. Of the five remaining items, none 
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meets the criterion of 75 per cent classification as Al, A2 or B 
by teachers of one of these courses. Item 17 was so classified by 
66./ per cent of Relations et fonctions teachers; item 21 by 
63./ per cent of Calcul teachers; “item "25 bys66" G6. per "cemtmon 
Relations et fonctions teachers and 63.7 per cent of Calcul 
Leachers; item 92 by 65.6 sper cent of Calculiteacherosancmces 


36 by 54.6 per cent of Calcul teachers. 


Thus 31 of the 36 items meet the criterion of appropriate 
assessment by secondary teachers, with four of the remaining five 
close to that criterion. (Considering the number of respondents, a 


difference of 10 per cent represents about one response.) 
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TAB ECE AG: 21 


Test de rendement en mathématiques 


Distribution of Average Item Difficulties 


Devi OUNLEY Number of Items 
D0 0r = 90.09 g 
0-0-2 9 4 
O20 =e Ul? 6 
LOPES Cigar an, Uli fe 4 
0240 = a9 4 
550 = W259 4 
0,60 = 0.69 6 
0.7 09 = O).4o il 
O80" = 0.289 Z 
05.90. = O.99 Z 


TABLE A8.2 


Test de rendement en mathématiques 


DAVSeiP OMEN Cup Baugaemiel (Coreg llee ions 


Correlation Number of Items 
OF 002 =F.0 709 0 
OO =O . 0 
eo 20! «aa O29 5 
OV S0F =e 0s ao 6 
0. 4240p sO 49 9 
OU a0 9 8 
0.60 - 0.69 5 
Os 70 =p0% 79 2 
U2 00s =e 089 1 
Oe 20) e079 0 
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TABLE A8.3 


Test de rendement _ en mathématiques 
Distribution of Students for Whom the Given Item 


Waist neh Ss GaN Ole Neae hie du 


% of Students 


Item Not Reaching Cumulative % 
26 0 0 
ag. Z 2 
28 0 Z 
ay) 0 Z 
30 1 Se 
ol 0 3 
Beye 0 3 
De) q 10 
34 6 16 
8) BZ 48 
36 25 12 
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APPENDIX A9 
LECINE CALRERPORT UNSTHE 


Peoi DEARLEHMETIOUE BT DO ALGEBRE) DE BASE 


The Test d'arithmétique et d'algébre de base was administered to 291 
students potentially eligible in June 1976 for a Secondary School 
Graduation Diploma (SSGD). These students were drawn from the 
Francophone population of the 14 Francophone and bilingual schools 
involved in the study (see body of report for definition of 
Francophone population in bilingual schools). The population tested 
included students currently studying Mathematics at the Enriched, 
Advanced and General levels, as well as students not currently 


studying mathematics. 


Be TES Ts CONTENT 
The test content was identical, except for lanquage, with that of 


the Test of Arithmetic and Basic Algebra. For a description of 
the content, see the technical report on that test (Appendix A7). 


Zed bCHNI CAE SISSUESSAB.0UT THESATEST 


2, Difficulty 


The difficulty D of an item is generally reported as_ the 


proportion of persons writing the test who chose the correct 


Be | 


response to the item. In the case of this test, what is reported 
is an average difficulty for each item. The difficulty was 
calculated in the usual way for the students within each 
individual school. The resulting indices were then averaged over 
all schools. The mean item difficulty was 0.42 with a standard 
deviation of 0.15. The distribution of item difficulties is given 
inulable AS +17. 


2.2 Biserial Correlation 


The mean biserial correlation, using raw total test score as the 
criterion, was 0.66 with a standard deviation of 0.13. The 


distribution of biserial correlations is given in Table A9.2. 


255) Reliability 


The Hoyt estimate of reliability for uncorrected scores was 
Dsn9. 1s 


2.4 Speededness 


A test is considered unspeeded by one rule of thumb, if: (i) at 
least 80 per cent of those writing reach the last item; (ii) all 
candidates reach the three-quarter mark. This test evinces some 
speededness. Only 74 per cent of the candidates reached the last 
item, and only 95 per cent reached the three-quarter mark. It 
should be noted that the "not reached" figure for the last item 
is subject to considerable distortion, since it is impossible to 
distinguish between those who did not reach the item and those 
who reached it but did not attempt it. It is possible, therefore, 
in view of the fact that 90 per cent of those writing reached the 
next-to-last item, that in fact 80 per cent or more did reach 
this item but latnunber of those ‘did® not vattemptr it. An 


item-by-item report on dropouts is given in Table A9.3. 


2.29 Test Scores 


A correction for guessing was applied to test scores by 
subtracting from the number of correct responses one-quarter mark 


for each incorrect response. 


The mean corrected test score was 10.67 with a_ standard 


deviation of 9.68. 


TEST APPRAISAL 


5.1 Content sValidity 


As a check on content validity, teachers of Applications des 
Mathématiques 2 (the General stream SSGD level course), Fondement 
des Mathématiques 2 (the Advanced stream SSGD level course), and 
various first year CAAT mathematics courses taught in French 
completed Test Appraisal Inventories (Test inventaires estimatifs) 
related to the test. The secondary teachers were asked to 
classify each item under one of the following .headings (see 


the English translation): 


A. Connaissances antérieures que les étudiants devraient 


avoir au début du cours. 


Al. Ces conmaissances ne sont pas revues dans le cours. 


A2. Ces connaissances sont revues dans le cours. 


B. Connaissances nouvelles que tous les étudiants doivent 


acquérir pendant le cours. 


C. Connaissances nouvelles que quelques étudiants doivent 


acquérir pendant le cours. 


bie 


Cl. De 1% €A 25% des étudiants seulement doivent les 


acquérir. 


C2. De 26% &a 50% des étudiants seulement doivent les 


acquérir. 


C3. De 51% a 75% des étudiants seulement doivent les 


acquérir. 


C4. Plus de 75% des étudiants--mais pas tous--doivent 


les acquérir. 


D. Connaissances nouvelles qu'aucun étudiant ne doit 


acquérir pendant le cours. 


The corresponding categories for CAAT teachers were: 


A. Connaissances antérieures que les étudiants devraient 


avoir au début du cours. 


Al. Ces connaissances ne_sont pas revues dans le cours. 


A2. Ces connaissances sont revues dans le cours. 


B. Connaissances nouvelles que tous les étudiants doivent 


acquérir pendant le cours. 


Gas Autre: 


Table A9.4 presents, for each item, the number of teachers 
at each level choosing each classification for the item. The 
numbers of respondents at each level were: Applications teachers, 
o3y) fondement Steachers)jeirl2 se(CAA Tes teachers:seuli9e (table: AGES 


presents, for each item on the test, the following information: 


(a) A general description of the content. 


(b) A statement of the grade level at which students in the 
General or Advanced stream would first be expected to 
encounter the item content, according to Ministry of 


Education guidelines. 


(c) The assessment given the item by “teachers” ‘or “the 
Applications course, teachers of the Fondement course, 
and teachers of first year CAAT Mathematics courses. 
This assessment was arrived at by aggregating the 
number of respondents assessing the item as (in this 
onder)icA by A23 BECoPr lsat oaneim and? Juntil the 
aggregate number equalled or exceeded 75 per cent of 
those responding to the item. (In the case of CAAT 
teachers the aggregation was done in the order Al, A2, 
B, C.) Thus for example the rating by Applications 
teachers of item 1 as A2 means that fewer than 75 per 
cent of these teachers rated the item as Al, but 75 


per cent or more of them rated it as Al or A2. 


(d) The difficulty of the item (the average over schools of 
the proportion of students correctly answering the 


item). 


This table includes three different criteria for determining 
content validity. First is the level at which, according to 
Ministry guidelines, the student should first have been taught the 
content of an item. All test items meet this criterion, since 
their content is included in the guidelines either at or before 
the SSGD level. Second is the assessment by teachers’ of 
Applications and Fondement as to whether students should be able 
to answer a particular item on entry to or on exit from their 
courses. All items meet this criterion for the Advanced level, 
where students are expected to have met the content of all items 
either on entry to the SSGD level course (27 items) or during 
that course (8 items). At the General level, students are 
expected to have met the content of 25 items before the SSGD 
level, and the content of a further 9 items at that level. The 


one remaining item does not quite meet the criterion; slightly 


SG 


fewer than 75 per cent of Applications teachers rated it as Al, 
Ai2uto 02. eB pe butyy80 djpens centberatesaitves Ad ,feAZAeeB or (C.eethe 
third criterion is the assessment by CAAT teachers of their 
expectations of entering students and the content of their first 
year courses. These teachers expected the students to have 
encountered the content of 24 of the items before entering 
college, and taught the content of the remaining 11 (indicating 
that the content of these items is relevant to first year work in 


mathematics). 


It thus appears that the test can be considered valid in 


terms of content. 


TABLE A9.1 


Test d'arithmétique et d'algébre de base 
Distribution of Average Item Difficulties 


Dikh seule y, Number of Items 
O00 009 0 
Oe Ol ea Ok 2 
DZ 0i Ue 8 
0, 20> 059 5 
0.40 049 9 
0,20°> = 8259 8 
OF GO) = 70369 z 
SAO i Ur ee, il 
O28 0" = 0.689 0 
OF9 0 = 1099 0 
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TABEE AQ2Z 


Test dZarithnetigue et dea lGenre de base 
DiSeriovicion Of Bageriald, CoOrireilee ome 


Correlation Number of Items 
OOO Mee OO. 0 
Dice IMO) Sy 3 es ag) 0 
Oe 20 O29 0 
OLSON = 0259 1 
0407 = 049 4 
One Oyen Ui 5 
0360) = 07769 13 
O27 Oa n0 a9 6 
O2380) =.0.89 5 
ORO ea Onno 2 i 


TABLE Ag. 3 


Test d'arithmétique et dSalgebre de base 


Distribution of Students for Whom the Given Item 
Was the First 'Not Reached' 


ws (Cr SieWekevanes 


Leem Not Reaching Cumulative % 
20 a 1 
Jel 0 1 
Dae 0 Bh 
fa) 0 1 
24 2 3 
72 0 ) 
26 iI 4 
2, A 2 
28 i 6 
29 0 4G 
30 0 6 
Dal 0 6 
BL 0 6 
D2 5 9 
34 a 10 
a5 16° 26 


“For the last item, the "not reached" figure includes 
also those students who reached but did not attempt 
the item. 
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APPENDICE B 
ANALYSE DETAILLEE DES TRAVAUX ECRITS D'UN ECHANTILLON 


D'ELEVES FRANCO-ONTARIENS DE 12EME ET 13€ME ANNEES2 


(Ce rapport a été préparé par Raymond Mougeon, Monique Bélanger et 
Michael Canale, Section franco-ontarienne, Institut des Etudes 


Pédagogiques de l'Ontario.) 


1. INTRODUCTION 


Dans le cadre de l'étude consacrée & la compétence en francais 
écrit des jeunes franco-ontariens de 12&me et 13éme~ années, on a 
décidé d'analyser un sous-échantillon de 50 rédactions (50 éléves) 
sélectionnées a partir d'un échantillon de base (400 rédactions) 
afin de faire une étude détaillée de la qualité du francais écrit 


par les éléves. La présente étude a les buts principaux suivants: 
(1) pour l'échantillon de 50 rédactions: 


(a) établir pour chaque éléve un indice d'erreurs basé 


sur toutes les erreurs commises par l1'éléve. 


(b) mettre en rapport ces indices d'erreurs avec les 
intentions professionnelles et éducatives des 


éléves. 


(c) mettre en rapport nos indices d'erreurs avec les 
notes de 1 a 10 attribuées aux mémes rédactions 


par un groupe d'évaluateurs. 


(d) calculer des pourcentages d'erreurs ayant trait a 
six parties du discours particuliérement 


fréquentes. 


(2) pour un sous-groupe de 16 rédactions sélectionné A 


partir de l'échantillon de 50: 


(a) donner des indications détaillées sur la fréquence 
et l'importance des différents types d'erreurs 


trouvés dans les travaux écrits. 


(b) proposer des explications quant a leur nature et 


origine. 


D'une fagon générale nous espérons que notre étude 
intéressera ceux qui, a la fois aux paliers secondaire et 
universitaire, sont concernés par la compétence en _ langue 
frangaise des éléves franco-ontariens de la fin du secondaire, et 
qu'elle leur sera utile pour établir une liste de priorités dans 
le domaine de l'enseignement du frangais. Nous pensons également 
que cette étude fournira des renseignements utiles & ceux qui 
s'intéressent €@ la question du maintien du francais dans les 
provinces du Canada ott les francophones sont minoritaires. 
Finalement, étant donné la taille de notre échantillon, il n'est 
pas inutile de souligner le caractére exploratoire de notre étude. 
On espere que ces résultats démontreront l'utilité d'une étude 
approfondie basée sur l'échantillon de départ. Une telle étude 
permettrait sans doute de prendre connaissance de facon plus 
précise des difficultés linguistiques de l'ensemble des éléves 
franco>ontarlens , -ainsi=liques dewicertainms des facteurs 
sociolinguistiques qui peuvent influencer leur maitrise du 


francais écrit. 
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PRESENTATION DE L'ECHANTILLON 


Comme nous l'avons indiqué plus haut, le sous-échantillon de 50 
rédactions a été prélevé & partir d'un échantillon de départ (400 
rédactions/400 éléves). L'échantillon de départ était composé 
d'éléves de 12éme et 138me années ayant recu leur éducation en 
frangais. Ces éléves proviennent de 14 écoles franco-ontariennes 
(Ecoles unilinques et  bilingues) représentant trois régions 
principales de concentration francophone en Ontario: le Nord, le 
Moyen-Nord et l'Est. Le lecteur trouvera plus de détail sur 
l'échantillonnage de base dans le rapport final du projet: La 


transition entre les niveaux secondaire et post-secondaire (projet 
ma) s 


La sélection des 50 rédactions s'est faite au hasard, a 
trois restrictions prés: il a été tenu compte (a) du sexe des 
éléves (25 gargons/25 filles); (b) de l'année d'étude (24 éla&ves 
de 12éme année/26 de 13é€me) et (c) des sujets de rédaction 
choisis par les éléves (cf. Tableau B.6 pour la répartition des 
20 rédactions en fonction du sujet et la liste des sujets de 
rédaction). -les 150) auteurs! »proviennent dés: P4 éeGles 
franco-ontariennes mentionnées plus haut. Tous ies éléves ont une 


connaissance active de l'anglais, mais, 4 des degrés différents. 
Les 6léves de 12é@me année se _ répartissent dans. les 
catégories suivantes quant 4 leurs projets d'avenir éducatifs et 


professionnels: 


(a) 13 6léves ont l'intention de continuer leurs études 


secondaires; 


(b) deux éléves vont auuleem directement dans un 


établissement post-secondaire; 


(c) six veulent chercher un emploi aprés la 12éme année et 


(d) trois n'ont pas d'idées précises sur leur avenir. 


Pour ce qui est des éléves de 13&me année, on a obtenu la 


répartition suivante: 


(a) 22 éléves se dirigent vers des études post -secondaires 


(université, collége communautaire ete.) 


(b) deux éléves vont redoubler leur 13ame année. 


(c) un 6léve va chercher du travail: 


(d) un éléve n'a pas d'idée précise sur son avenir. 


Comme on pouvait s'y attendre, la majorité des él&ves de 
13éme année est constituée d'éléves qui vont poursuivre leurs 


études. 


Le sous-groupe de 16 rédactions a été sélectionné au hasard 
Sans aucune restriction. On trouvera sa répartition en fonction 


des principales variables mentionnées plus haut, au Tableau B.7. 


METHODOLOGIE 


La méthode utilisée dans la présente étude pour évaluer la 
compétence en frangais écrit des éléves, est connue sous le nom 
d'analyse d'erreurs. I1 s'agit d'une méthode qui est maintenant 
couramment utilisée dans le domaine de l'acquisition linguistique 
(cf. entre autres (Corder 1976; Brown Loe? 53° Richards #11, 97a 
Nous l'avons nous-mémes utilisée & plusieurs reprises (Mougeon et 
Hebrard 1975; Mougeon et Carroll 1976 (a) et (b); Mougeon, 
Bélanger, Canale et Ituen 1976) dans nos études de la compétence 
linguistique des jeunes Franco-ontariens de Rayside, Sudbury et 
Welland. Du point de vue pédagogique, l'analyse d'erreurs a le 
mérite principal de permettre la réalisation d'études 


diagnostiques détaillées de la  compétence linguistique des 


“apprenants" d'une langue donnée, études qui peuvent avoir des 
retombées dans le domaine de l'enseignement des langues (Burt et 
Kiparsky 1972; Mougeon 1975). Il convient néanmoins de signaler 
un probléme majeur posé par la méthode de l'analyse d'erreurs, 
probléme qui a trait & la notion d'erreur. Plus précisément la 
notion d'erreur est étroitement liée aA la notion de norme, 
l'erreur étant ce qui ne se conforme pas a la norme. I1 en 
résulte que le choix d'une norme linguistique donnée, & des fins 
d'analyse, conditionne en partie l'évaluation de la compétence 
linguistique d'un invidivu. D'une facon générale nous’ pensons 
qu'il est possible d'apporter une solution par’ielle a ce probléme 
Si, d'une part on choisit une norme linguistique qui correspond 
au type de comportement linguistique (formel, informel, 6Cr ity 
parlé, etc) que l'on attend d'un individu dans une ou des 
situation(s) de communication donnée(s) et si d'autre part, on 
définit clairement les caractéristiques de la norme adoptée & des 
fins d'analyse. Dans le cadre de la présente étude nous avons 
décidé d'adopter le frangais canadien formel écrit comme norme de 
référence. Ce choix est motivé par l'orientation de l'étude 
générale dans laquelle s'inscrit notre étude. En effet, la 
premiere a notamment pour but de voir si les éléves ontariens de 
la fin du secondaire possédent les capacités requises pour 
poursuivre des études au niveau post-secondaire. Plus précisément, 
nous sommes partis du principe que dans le cadre des études 
post-secondaires on  attendra généralement des éléves_ qu'ils 
maitrisent une variété de langue conforme aux régles d'usage du 
frangais canadien soutenu. Ceci dit, mous avons choisi de facon 
un peu arbitraire de nous baser sur l'usage qui est décrit dans le 
dictionnaire du frangais canadien (Bélisle 1971) dans la mesure 
ou cet ouvrage propose un modéle explicite de francais canadien 
"correct". Il n'en reste pas moins que le dictionnaire Bélisle ne 
recouvre pas tous les aspects de l'usage de la lanque francaise, 
notamment pour ce qui est de la grammaire. Nous avons donc dd 
parfois avoir recours @a nos propres intuitions au sujet de la 
norme du frangais canadien formel. 2 Moyennant cette norme de 
référence, mous avons recueilli en examinant tous les’ travaux 
écrits, tous les éléments qui ne se conformaient pas a la norme 


(erreurs) -dans les domaines principaux de l'orthographe, du 
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vocabulaire et de la grammaire. La collecte des erreurs a été 
effectuée par deux linguistes (Monique Bélanger et Raymond 
Mougeon) qui ont corrigé ensemble chacune des rédactions. Ainsi 
a-t-on obtenu un nombre d'erreurs plus élevé qu'il n'est possible 
d'en obtenir quand une personne seule s'adonne & la tAche ingrate 
de la collecte des erreurs. D'autre part, nous avons réduit un 
tant soit peu le risque d'évaluation subjective, dans les cas ou 
nous avons eu affaire a des éléments linguistiques non répertoriés 


dans le dictionnaire Bélisle. 


RESULTATS 


Commengons par les indices d'erreurs trouvés pour chacun des 50 
éléves. Avant de présenter les résultats sous forme de tableaux, 
il convient de donner quelques explications au sujet de nos 
indices d'erreurs. Ces derniers ont été calculés en divisant le 
nombre total d'erreurs commises par un éléve donné par le nombre 
total de mots écrits par l'éléve. Cette méthode a déja été 
utilisée par Scott et Tucker (1974), et Mougeon et Hébrard 
(1975). Un tel indice ne peut nous donner qu'une indication 
générale sur la maitrise du francais écrit par les éléves, ceci 
pour plusieurs raisons: (a) il peut y avoir plus d'une faute par 
mot (cf. plus bas); (b) tous les mots ne présentent pas le méme 
niveau de difficulté; (c) tous les étudiants ne commettent pas les 
méme types d'erreurs dans les mémes proportions. Moyennant ces 
réserves, nos indices sont d'un intérét appréciable, dans la 
mesure ou ils représentent une mesure relativement objective de la 
compétence linguistique des éléves, mesure qui repose sur la 


totalité des erreurs commises par ces derniers. 


Comme l'indique le Tableau B.1, les deux groupes principaux 
d'éléves de 12@me année (ceux qui vont continuer leurs études et 
ceux qui vont chercher du travail) ont des capacités & écrire le 
frangais nettement différentes. Plus précisément, on constate une 
difference de plus de 10 points entre l'indice moyen du premier 


groupe et celui du deuxiéme groupe. Il est aussi intéressant de 
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Temarquer, que l'indice le plus élevé trouvé pour le premier 
groupe, correspond a peu de chose prés & l'indice le plus bas 
trouvé pour le deuxiéme groupe. On notera également, que la 
minorité qui se dirige directement vers une institution 
post-secondaire, a également un indice d'erreurs bas. L'existence 
d'une relation entre les intentions professionnelles et éducatives 
des 50 éléves et leurs capacités a écrire le frangais (telles que 
mesurées par notre indice), est un fait intéressant que nour 
croyons bon de souligner. En effet, on peut Supposer qu'un nombre 
appréciable des éléves qui déclarent vouloir continuer leurs 
études secondaires, se dirigeront sans doute apres la 13éme année 
vers un établissement post-secondaire. Comme nous l'avons indiqué 
plus haut, dans de tels établissements, on attend de 1'éléve une 


bonne connaissance de la langue écrite. 


Parmi les éléves de 13@me année (Tableau B.2) on trouve 
surtout des éléves qui ont l'intention de faire des études dans un 
établissement post-secondaire; les scores de ce groupe d'éléves 
présentent donc un intérét tout particulier. On  constatera 
d'abord, que l'indice moyen de ce groupe est proche de celui du 
groupe des 12é€me année qui ont 1l'intention de continuer leurs 
études. Toutefois, on se doit aussi de remarquer que 
contrairement a l'attente normale, nous avons obtenu une large 
distribution pour les scores du sous-groupe d'éléves de 13éme 
année qui veulent continuer leurs études (de .01 A&A .22). En 
effet on aurait pu s'attendre 4 trouver parmi ce sous-groupe une 
nette majorité d'éléves ayant des indices d'erreurs bas ou 
relativement bas étant donné les intentions éducatives de ces 
éléves. Or on constate que seulement une faible majorité des 
éléves de ce groupe ont des indices inférieurs 4 .10. Sur la base 
de ces résultats, on peut supposer qu'un certain nombre des éléves 
de 13@me année qui veulent continuer leurs études et qui ont des 
indices d'erreurs assez élevés, éprouveront des. difficultés 
certaines seis ontipa® étudier- dans’ ‘un établissement 
post-secondaire de langue francaise. A ce sujet, il serait 
intéressant de mesurer la compétence en anglais écrit de ces 
6éléves afin de voir si ceux-ci auraient avantage a faire leurs 


études en anglais. Etant donné la taille trés réduite des autres 


aa9 


Sous-groupes d'éléves de 13éme année, nous nous abstiendrons de 


faire des commentaires au sujet des scores de ces éléves. 


Il reste une question importante a laquelle il nous est 
difficile de répondre: celle de savoir quel est le niveau de 
maitrise du frangais écrit exigé de facon générale par les 
établissements post-secondaires de langue francaise. Faute de 
pouvoir répondre a cette question, nous ne pouvons pas indiquer 
dans quelle proportion les éléves de notre échantillon de Duly, 
possédent la compétence en francais écrit nécessaire aux études 


post-secondaires. 


Passons maintenant & la comparaison de nos indices avec les 
notes globales données par les évaluateurs. A ce sujets ailyeraut 
mentionner que chaque rédaction a été évaluée par neuf juges 
différents. Ceux-ci ont eu pour consigne de noter chaque 
rédaction de un a dix en se basant sur leur impression générale. 
Ils devaient également passer’ en moyenne deux minutes par 
rédaction. Il nous est donc apparu intéressant de confronter ces 
deux méthodes d'évaluation différentes. Pour ce faire, nous avons 
essayé de voir s'il existe une corrélation entre la note moyenne 
obtenue par chaque éléve et l'indice d'erreurs correspondant. En 
appliquant le test "Pearson Product Moment" nous avons trouvé une 
corrélation relativement élevée (-74). Le fait que la corrélation 
soit négative indique que plus les notes des juges sont élevées, 
moins les indices d'erreurs le sont. Nous avons illustré cette 
corrélation de fagon graphique (cf. graphique 1). Ce résultat 
nous autorise a conclure que la méthode de l1'évaluation globale et 
rapide a abouti a des résultats similaires & ceux obtenus par 
l'analyse détaillée et quasi exhaustive des erreurs. On peut 
Tapprocher ceci du résultat semblable trouvé par P. Evans qui a 
étudié les rédactions de l'échantillon d'élaéves anglo-ontariens. 
On peut donc voir dans ces résultats une indication que la 
méthode de Il'évaluation globale et rapide peut représenter un 
raccourci relativement fiable. Il n'en reste pas moins (ceci est 
Evident) que la méthode de l'évaluation globale ne permet pas de 
diagnostiquer avec précision l'importance et la _ fréquence 


respective des différents types d'erreurs commises par les éléves. 
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Examinons maintenant les pourcentages d'erreurs calculés sur 
la base des 50 rédactions. Comme ces calculs prennent énormément 
de temps, nous nous sommes limités & six parties du discours 
ayant une fréquence d'occurrence élevée. Pour chacune de ces 
parties du discours (excepté les prépositions), nous avons analysé 
deux types d'erreurs: celles qui ont trait a l'usage lexical et 
celles qui ont trait & l'accord de nombre. Les pourcentages 
d'erreurs ont été calculés en divisant le nombre total d'erreurs 
trouvé pour une partie du discours donnée dans les 50 rédactions, 
Oar une estimation de la totalité des occurrences de cette méme 
partie du discours. Pour arriver &a une telle estimation, nous 
avons compté le nombre total d'occurrences d'une partie du 
discours donnée dans un sous-groupe de 15 rédactions sélectionées 
au hasard. Ceci nous a permis d'établir la fréquence moyenne 
d'occurrence; avec cette fréquence nous avons estimé le nombre 
total d'occurrences des six parties du discours pour les 50 


rédactions. 


Commengons par examiner les erreurs d'usage (Tableau 8.3) 


Avant de commenter les résultats concernant les erreurs 
d'usage, nous allons définir briévement ce que nous~entendons par 
erreur d'usage et fournir ensuite une série d'exemples tirés des 
90 rédactions. Faire une erreur d'usage consiste A ne pas 
utiliser ce qu'on appelle communément le mot juste. Les erreurs 
d'usage sont d'une gravité variable, dans la mesure ou elles 
peuvent nuire plus ou moins a la communication linguistique. Dans 
notre étude, elles recouvrent deux catégories principales: 
l'omission d'un élément linguistique ou la substitution d'un 
élément linguistique “a un® autre? Elles ineluent “aussi (dans des 
proportions moindres) l'utilisation d'un élément linguistique dans 


un contexte ot il n'est pas requis. 


Présentons maintenant quelques exemples' illustrant les 


différents types d'erreurs qui figurent au Tableau B.3. 


bist 


(a). oP Di6.p 0:8, trio. nisl). 00S ‘islaye wapmLimeshoe ni ticumdte 
balle = quelques blocs d'ici (omission) (2) Le Canada 
a beaucoup a offrir pour les jeunes (substitution) (3) 
L'étudiant n'a pas de besoin de ces cours (utilisation 


non requise) 


(b) Verbes: (1) Sans ce stade aucun succés significant 
pourra étre apergu (substitution) (2) Il y a des aides 
du gouvernement pour fournir aux besoin de _ tous 
(substitution) (3) En réflétant sur le passé, l'homme 


inventa une communication trés utile (substitution) 


(c) Substantifs: (1) Pour .atteindre. le pas. dela 
compe tations (substatutaonenG2)) pluayebimi.d ate sela 
non-confiance.. .décourage souvent les compétiteurs 
(substitution) (3) L'homme inventa une communication 


trés utile (substitution) 


(d) Article: (1) Il construit machines a détruire le 
Monde ;elomissionre (2): ie. sagaviec. (eele0 seude mete 
(ommission) (3) Le monde de communication est un 
spectre qui permet aux gens de faire face a une dure 


réalité qui nous entoure (substitution) 


(e) Adjectif: (1) Sans ce stade aucun succés significant 
pourra €étre apercu (substitution) (2) La _premiére 


journée de ta naissance (utilisation non requise) 


(Ff) Pronom: (1) On a tendance A ___humilier (omission) (2) 
Paty\S01,.) ily -désine. en) connaltre plus encore 


(substitution) (3) ___sont jamais vrais (omission) 


Le tableau) B.3 /indi que que les six parties du discours 
créent plus ou moins de difficultés aux étudiants. On remarquera 
que ces derniers semblent avoir des oproblémes' relativement 
sérieux avec les prépositions et avec les verbes. Pour ce qui est 
des prépositions, on peut signaler que plusieurs études (Oller et 
Inal. 19713. Scott et Tucker 1974; Mougeon et Hébrard aS = 


Bo 


Mougeon et Carroll 1976) ont montré que les "“apprenants" d'une 
langue premiére ou seconde éprouvent généralement des difficultés 
a maitriser les systémes de prépositions. A notre avis, ceci est 
dd en partie au fait que, bien que les prépositions forment un 
ensemble d'éléments restreint, elles prennent, en se combinant 
aux verbes et aux substantifs, une multitude de sens différents 
qui ne sont pas toujours généralisables. Ceci dit, il est 
également possible de considérer les prépositions comme des 
éléments qui ne sont pas absolument essentiels & la communication 
(Brown 1973) dans la mesure ow les erreurs de préposition 
n' entravent généralement pas la  compréhension du message 
linguistique. Ceci explique peut-étre en partie pourquoi, parmi 
les erreurs de _ prépositions, on trouve une proportion § non 
négligeable d'omissions. Les verbes, par contre, figurent parmi 
les éléments essentiels du message linguistique. Le pourcentage 
Ttelativement élevé d'erreurs trouvé pour ceux-ci, nous porte a 
recommander que l'on mette l'accent sur leur usage. Le fait que 
les étudiants éprouvent moins de problémes avec les substantifs 
qu'avec les verbes, constitue un résultat inattendu de notre 
étude. Nous aimerions voir s'il est confirmé par une étude qui 
serait basée sur un échantillon plus grand. Le fait que les 
éléves tendent a avoir une meilleure maitrise des adjectifs que 
des verbes est &@ souligner aussi, dans la mesure ol! il va plus ou 
moins a l'encontre d'une "ancienne" théorie récente (Jacobs et 
Rosenbaumy) L96.8)) .qua spestulakt. que les sadjectifs cont 
sémantiquement analogues aux verbes. Pour ce qui est des 
pourcentages d'erreurs relativement bas trouvés pour les articles 
et les pronoms, on peut l'expliquer par le fait que les systémes 
des pronoms et des articles nécessitent l'acquisition d'un nombre 
restreint de régles qui (contrairement aux prépositions) sont 
hautement généralisables, bien que dans certains cas l'utilisation 
des articles (opposition défini/indéfini) puisse @tre assez 


subtile. 


Passons maintenant aux erreurs ayant trait a l'accord de 
nombre. Le tableau B.4 ne concerne que cing parties du discours. 
Les prépositions, qui ne sont pas sujettes a l'accord de nombre, 


ont été exclues. 
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Par erreur de nombre nous entendons l'emploi de la marque 
du) ipluriele®: Jaaisotian tn Singulier est requis et vice versa. 


Donnons-en quelques exemples tirés des 50 rédactions. 


(a) es AsDR Caneie ies Gis) Ils ont des "contact s 
physique__violent__. (ii) Ils mangent a leurs faim. 
(Ill) Comparativement 2 GMNaU Cres FES Aptaryes 


sous- déve loppé 


(b) | MERBEigs (4) . Ellesiece situe__loin dans le temps. (ii) 
Ceux qui aime la nature. (iii) Ces machines fait par 


l'homme. 


(c)) SUBS PANTIE 20Gi)a Son amis __ lui apprend du nouveaux. 


(ii) L'enfant passe & travers beaucoup d'échec__. (iii) 


De jours en jours. 


(d) PRONOM: (i) Seulement’ pour toi et non pour d'autre 
Givmfousitceirqui WMentotnes iin ie genre de vie qui 


eunsy pikadit. 


(e) ARTICLE: (i) Les temps de__dépressions et de querres. 
(id) odiderest. tnécessaire del euivre deme ccars 


spécifiques. 


On notera d'abord que l'article n'est pratiquement pas 
affecté par les erreurs de nombre. Ceci est sans doute dd en 
partie au fait qu'en francais la marque pluriel/singulier est a la 
fois “essenbielle wet | évidentes Yau dtniveat onde articles @pius 
précisément, si l'on considére les deux phrases: les petits 
garcons restent tranquilles et le petit garcon_reste tranquille, 
on s'apergoit qu'en lanque parlée la différence pluriel/sinqulier 
n'est indiquée que par l'alternance entre les articles hes ietle; 
d'ou sans doute le fait que nous ayons trouvé trés peu d'erreurs 
de nombre au niveau de l'article. Si l'article est tres peu 
affecté par les erreurs de nombre ceci est aussi vrai (a peu de 
choses prés) pour les substantifs et les pronoms personnels. En ce 


qui concerne les substantifs, ceci s'explique peut-étre en partie 


par la relation de solidarité qui existe entre le substantif et 
l'article. Pour ce qui est des pronoms personnels, on notera que 
dans la plupart des cas la différence entre le Singulier et le 
pluriel est évidente, sauf pour les pronoms de la 3&me personne 
il/ils, elle/elles, leur/leurs, autre/autres qui ne sont pas 
différenciés en langue parlée. C'est du reste ces derniers pronoms 
qui ont surtout fait l'objet des erreurs de nombre. Pour ce qui 
est des verbes, on notera que, en dépit du fait que la différence 
de nombre est généralement indiquée de facon évidente en langue 
parlée par l'alternance des terminaisons (e/ont, -s/ez etc.), il y 
a un risque de confusion avec les terminaisons de la 3éme 
personne e/ent et ait/aient qui sont prononcées de fFacgon 
indentique. C'est du reste en grande partie au niveau de ces 
derniéres terminaisons que nous avons trouvé des erreurs de 
nombre. On notera également que les erreurs de nombre ont affecté 
les participes passés (cf. exemple b(iii)). Finalement, en ce qui 
concerne les adjectifs, on notera qu'a l'inverse des substantifs, 
ils sont souvent éloignés de l'article, exception faite de la 
minorité des adjectifs ‘qui sont utilisés avant’ le “nem. Ceci 
explique’  peut-étre ‘pourquoi nous’ avons’ trouvé un taux 


relativement plus élevé d'erreurs de nombre les affectant. 


Passons maintenant a l'analyse détaillée des différents types 
d'erreurs. Comme nous l'avons indiqué plus haut cette étude est 
basée sur un sous-groupe de 16 rédactions (cf Tableau B.7) et non 
sur les 50 rédactions, car nous n'avons pas disposé d'assez de 


temps. 


bet tab leat “Bea wtraittdapparatirertdixetypes d*erreurs 
aaipiénents< Ceux-ci ont été ordonnés en fonction de _ leur 
importance respective, a l'intérieur de l'ensemble de toutes les 
erreurs trouvés dans les 16 rédactions. Ces pourcentages ne nous 
donnent qu'une indication assez grossiére de l'ampleur’ des 
difficultés que crée respectivement chacun des types d'erreurs aux 
éléves. Ceci est dG au fait que les types d'erreurs portent sur 
des éléments linguistiques de fréquence variable et que la 
variation de cette fréquence se répercute plus ou moins au niveau 


du nombre absolu d'erreurs d'un type .donné. Idéalement, nous 
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aurions dG calculer pour chacun de ces dix types d'erreurs, des 
pourcentages d'erreurs du type de ceux que nous avons présentés 
aux tableaux B.3 et B.4, mais étant donné l'ampleur de la t&che, 
nous avons di y renoncer. Moyennant ces réserves, on peut noter 
que la majorité des erreurs est couverte par quatre types 
d'erreurs. Deux de ceux-ci (usage lexical, accord de nombre) ont 
été examinés rapidement dans la section précédente. On voit que 
la catégorie des erreurs d'orthographe arrive en deuxiéme. Elle 
devrait d'aprés nous, étre une des principales préoccupations des 
responsables de l'enseignement du frangais, d'autant plus que nous 
n'avons pas compté certaines erreurs d'orthographe mineures (cf. 
plus bas). Le reste des erreurs consiste en un groupe de types 
d'erreurs de fréquence assez peusélevée;) mis alypart le genres ngque 


semble créer des difficultés aux étudiants. 


Ceci dit, examinons plus en détail ces différents types 
d'erreurs, en particulier ceux qui n'ont pas été examinés dans la 
section précédente. Commengons par un des types. les plus 
importants, & savoir les erreurs d'orthographe. Indiquons a ce 
Sujet que nous n'avons pas compté les erreurs d'accentuation, sauf 
quand l'accent correspond & une différence de fonction (exemple 
a/a, ou/oul). Ont été également excluwes’, les erreunsede 
ponctuation (points sur les "i", virgules, etc.) et celles portant 
sur l'utilisation des lettres capitales. Nous avons dif férencié 
deux types d'erreurs  d'orthographe, celles attribuables Aa 
l'homonymie et celles qui ne le sont pas. Nous avons identifié 77 
erreurs (44%) qui découlent de l'homophonie et 97 (57%) qui n'en 
découlent pas. La premiére catégorie recouvre 17 différents types 
de conflits homonyniques (exemples: sont/son, “ieé/ses ° ete). 
Ceux-ci sont listés au tableau B.8. Nous les avons ordonnés en 


fonction de leur fréquence. 


L'erreur consiste & utiliser une forme aA la place d'une 
autre. On mnotera en particulier parma les erreurs  lesi\ plus 
fréquentes celles qui ont trait aux paires --é/--er (de loin la 
plus fréquente), &/a, qui/qu'il, ce/se et aux trois doublets 
s'est/c'est; ces/ses; sait/sais. Voici quelques exemples de 


certaines de ces erreurs: Je suis d'accord que, ses troiws causes 
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sont responsables; et on a tendance a humilié ses ennemis; et se 
seras impossible de se separée de la violence; nous voulons étre 
nous-méme, et fair ce qu'il nous plait. A notre avis, ces 
erreurs découlent en grande partie d'un mauvais apprentissage des 
différentes catégories grammaticales du francais, qui fait que 
l'éléve ne sait plus différencier les participes des infinitifs, 
les démonstratifs des possessifs, etc. Or ce discernement est une 
des clefs essentielles de l'orthographe francaise, dans la mesure 
ou cette derniére est malheureusement beaucoup plus grammaticale 
et étymologique que phonétique. A ce sujet, nous renvoyons le 


lecteur a Martinet (1969), s'il veut approfondir cette question. 


La deuxiéme catégorie d'erreurs, comporte en majeure partie 
des erreurs qui ont trait au doublement des consonnes et, dans 
une moindre mesure, @a la simplification des groupes de lettres 
qui ne sont pas prononcés comme ils sont écrits. En voici 
quelques exemples: une communication se fesait en quelques 
secondes; un probléme sérieux celui de la polution; détruire des 


miliers de bactéries; tout en prennant un bref appercu de la 


Situation; la libération de la famme. 


En ce qui concerne les erreurs d'usage lexical (le plus 
fréquent des dix types d'erreurs), signalons qu'en plus des 
substantifs, verbes, adjectifs et prépositions (examinés plus 
haut), elles ont aussi porté sur les adverbes, les pronoms et les 
conjonctions. On trouvera au tableau B.9 des renseignements sur 
l'importance respective des différents types d'erreurs d'usage. A 
ce sujet, on peut remarquer que les erreurs d'usage ont été 
divisées en deux catégories; celles qui sont plus ou moins 
acturibuabWecwnay sarnipucneeetduriex tare Vargas (Gerreurs 
d'interférence) et celles qui ne le sont pas. Voici quelques 
exemples d'erreurs, d'interférence: on nous dit par aprés que le 
blamme est sur la télévision; la roue nous a élevé d'un stage 
primitif a un stage de progrés; sur le cété humanitaire, on peut 
vivre plus longtemps...; la transportation est un autre domaine 


OUIC ne cco. 
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ll n'est peut-étre pas inintéressant de noter que nous 
n'avons trouvé des erreurs d'interférence qu'avec les substantifs, 
les verbes, les adjectifs et les prépositions, et ce dans des 
proportions relativement similaires pour chacune de ces parties du 
discours, exception faite des prépositions qui semblent avoir été 
un peu plus l'objet d'erreurs d'interférence. Un fait similaire a 
dé ja été noté par Mougeon et Hébrard (1975) et Mougeon et Carroll 
(1976(c)) relativement & l'acquisiton des prépositions anglaises 
par des jeunes franco-ontariens de Sudbury et Welland. A notre 
avis, la complexité des systémes des prépositions frangaises et 
anglaises n'est sans doute pas étrangére au fait, ) quar layfois! en 
anglais et en frangais, les bilingues commettent de nombreuses 
erreurs de prépositions dues a l'interférence. Plus précisément, 
ayant a faire face aux t&ches complexes d'acquérir et d'utiliser 
deux systémes de prépositions, les éléves bilingues plus ou moins 
consciemment simplifient ces tdches, en transférant certaines des 
regles d'usage d'un systéme dans l'autre et vice versa. Ce 
faisant, les éléves produisent des structures non conformes aux 
normes du frangais et de l'anglais canadiens telles qu'utilisées 
par les unilingues (cf. Mougeon et Carroll, 1976b; Mougeon, 
Bélanger, Canale et Ituen, 1976). 


Pour ce qui est des erreurs de nombre, les ayant déja 
examinées plus haut, signalons seulement qguew-si 7 Werreunss de 
nombre consiste a utiliser le singulier pour le pluriel et vice 
versa, la substitution du singulier au pluriel s'est avérée plus 
fréquente (60%) que la substitution inverse (40%). Une telle 
tendance a déja été constatée dans plusieurs études consacrées a 
l'acquisition de l'anglais (premiére langue: Brown, 1973, langue 
seconde: Mougeon et Hébrard, 1975) et de l'espagnol (Chun et 
holatzer,9)19/5)..) Lh gfauk) ye voir. sans tdoute l'existence d'une 
tendance a surgénéraliser la forme qui est morphologiquement plus 


Simple (cf. plus bas les erreurs de genre). 


Les erreurs de genre ont surtout porté sur quatre parties du 
discours: les articles et les substantifs, les adjectifs et les 
participes passés. Ces deux derniéres parties du discours 


recouvrent plus de 70% des erreurs de genre. Cette proportion est 
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d'autant plus élevée que les adjectifs et les participes passés ne 
sont pas des items trés fréquents. Il se peut donc que l'accord 
de genre soit pergu par les éléves comme nettement redondant au 
niveau de l'adjectif et du participe passé, dans la mesure ou le 
genre apparait en premier et de facgon évidente (cf. plus haut) au 
niveau du groupe article-substantif. Voici quelques exemples 
d'erreurs de genre: comme étudiant active dans les sports; ceci 
est une accomplissements trés importantes; une bonne idée est 
souvent apprécié; ce n'est pas l'honneur individuelle que ces 
athlétes recherchent. Comme on peut le voir ci-dessus, l'erreur 
consiste a substituer le féminin au masculin et vice-versa. De 
méme que pour le nombre, nous avons trouvé que la substitution de 
la forme plus simple (masculin) & la forme plus complexe 
(féminin) est légérement plus fréquente que la _ substitution 
inverse. On peut rapprocher ce résultat des découvertes similaires 
faites par Grégoire (1947) (francais langue premiére), Swain 
(1975), Tarone, Frauenfelder et Selinker (1975) (francais langue 
seconde). La plupart des erreurs concernant la personne du verbe 
ont trait aux marques de la deuxiéme et de la troisiéme personnes 
du singulier. Celles-ci sont, soit omises, soit substituées l'une 
a l'autre. En voici quelques exemples: méme si nous venons a 
abolir les films de violences, notre société serais dédle. Ce 
seras impossible de se séparée de la violence. En grandissant tu 
connarctray La Gefinitionmde™ ce mov; l *essare ‘detvsertenir® sur’ ses 


deux pieds. 


Les erreurs d'article sont uniquement des erreurs d'usage. 
Elles se divisent en deux catégories principales, l'omission des 
articles définis, indéfinis et partitifs (la majorité des erreurs 
d'articles), la substitution d'un article défini a un article 
indéfini et vice-versa. Le lecteur trouvera des exemples de ces 


erreurs dans une section précédente. 


Dans la catégorie des erreurs de syntaxe on trouve surtout 
des erreurs qui ont trait a l'ordre des éléments dans la phrase, 
Exemples: mettra t-il a fin la pauvreté? Pourquoi devrait un 
étudiant prendre un cours de frangais? Tout de méme, dans ce 


monde de moderne technologie on peut... 
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Un remarquera que les éléves ont, dans l'ensemble, peu de 
problémes avec l'ordre des éléments dans la phrase puisqu'on a 
trouvé peu d'erreurs de ce genre. La méme remarque est valable 
pour l'omission de la particule négative ne, exemple: il veut 
pas les écouter; cette derniére est rarement omise par les 
étudiants. Ce résultat est intéressant car une étude récente 
(Vincent & Sankoff, 1975) a montré qu'en frangais canadien parlé, 
la particule ne est omise dans plus de 95% des cas. On peutsidonc 
remarquer que les étudiants n'ont pratiquement pas transféré dans 


leur écrit l'omission de ne, caractéristique du francais parlé. 


la catégorie des erreurs’ appelées "autres" recouvre 
différents types d'erreurs de basse fréquence. Parmi les plus 
importantes on peut citer  l'omission du pronom personnel, 
exemple: faut mettre en considération le fait Gil Sepets 
l'omission du relatif que, exemple: je pense__je vais faire un 
ingenieur; l'utilisation redondante de que, exemple: quand qu'ils 
se trompent ils perdent confiance. On notera que ces erreurs 
peuvent 6tre expliquées en partie comme des transferts de traits 


caractéristiques du frangais canadien parlé. 


Finalement ont été classées comme erreurs non analysables 
celles pour lesquelles il s'est avéré impossible de statuer de 
fagon plausible sur leur nature et origine. Dans cette catégorie 
nous trouvons en grande majorité des phrases incomplétes et 
difficilement interprétables. En voici quelques exemples; En fait 
tout _y est la participation et enfin la victoire ou la défaite; 
Certainne personne en _voyent des films ou la_ télévision avec 
violence croit que ca peut leur _arriver; il _s'étonne pour les 
livres permettent la distribution & une age. Il s'agit comme on 


peut le constater d'erreurs assez graves dans la mesure ou elles 
entravent la compréhension du message linguistique. Pour cette 
raison, nous aurions aimé en trouver un nombre infime dans les 


rédactions des éléves. 
Avec ce dernier type d'erreurs, s'achéve la section 


consacrée aux erreurs du sous-échantillon de 16 rédactions. Dans 


la section suivante, nous allons examiner les limites de notre 
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étude suggérant par 1&8 méme des dimensions nouvelles pour une 


étude ultérieure basée sur un échantillon plus large. 


CONCLUSION 


Signalons tout d'abord que l'analyse des différents types d'erreurs 
trouvés dans le sous-échantillon de 16 rédactions fait abstraction 
des différences invididuelles qui parfois peuvent étre frappantes. 
Ainsi nous avons remarqué que certains éléves tendaient aA faire 
plus d'erreurs d'un type que d'un autre. Certains éléves, par 
exemple, font surtout des erreurs d'orthographe et plus 
particuliérement des erreurs d'accord; d'autres ont un vocabulaire 
trés approximatif; d'autres, et cela est plus grave, produisent 
un nombre relativement élevé de structures incomplétes et 
obscures. Une étude basée sur un échantillon plus vaste devrait a 
notre avis tenir compte de ces’ différences individuelles. 
Signalons également que dans la présente étude, nous n'avons pas 
abordé la question épineuse des écarts  stylistiques. Ainsi 
certains éléves ont utilisé une quantité non négligeable de 
structures caractéristiques de la langue parlée informelle. Parmi 
celles-ci, on peut citer le redoublement du pronom, exemple: moi 
Je; lui il; il ne le comprend pas ga, etc.; l'interrogation avec 
tu, exemple: al comprend tu. ce qtte le monde Veulent; 
l'utilisation redondante de y, exemple: quand il y allait Aa 
l'école. Nous n'avons pas compté ces structures comme des erreurs 
car ce sont des points relativement mineurs, toutefois, il faut 
admettre qu'elles indiquent qu'un certain nombre d'éléves n'ont 
pas une bonne maitrise du frangais formel écrit (cf. Mougeon et 
Carroll, 1976 (a) et (b) pour des remarques similaires). Dans la 
mesure ou on peut supposer que les établissements post -secondaires 
s'attendent a ce que les étudiants fassent preuve d'une telle 
maitrise, on ne saurait trop recommander que l'on sensibilise les 
étudiants de la fin du secondaire a l'existence des niveaux de 
style. Ceci nous semble d'autant plus impératif que de par leur 
statut minoritaire, les jeunes fFranco-ontariens sont assez peu 


exposés aux variétés de frangaiS formel. En, effet, on peut 
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Tappeler que la création d'un enseignement de langue francaise au 
niveau secondaire est chose récente. Il en est de méme pour la 
télévision et la radio frangaises (deux véhicules de variétés de 
frangais formel). A cela, on peut ajouter que dans de nombreuses 
localités ott les Franco-ontariens sont présents, il est difficile 


de se procurer des livres et des journaux francais. 


Finalement, mentionnons que nous’ n'avons pas abordé la 
question importante de la logique dans la présentation des idées, 
dans la mesure ol elle déborde le cadre de l'analyse linguistique. 
Cependant, signalons que seule une infime minorité des étudiants 
a fait preuve d'une telle capacité. En fait, la plupart du temps, 
nous avons constaté que les idées sont mal enchainées. Pourtant, 
étant donné la nature des themes des rédactions, on aurait pu 
s'attendre € ce que les éléves suivent une démarche démonstrative. 
Etant donné que dans le contexte des études post-secondaires les 
étudiants auront sans doute & fonctionner de fagon discursive, on 
devrait peut-étre mettre davantage l'accent sur l'entrainement au 
discours logique dans les classes terminales des écoles 


secondaires. 
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TABLEAU B.1 


Indices d'erreurs des éléves de 12éme année en fonction de 
leurs intentions professionelles et educatives 


[Once des lowe @dVeiinewics 


Intentions mots écrits commises Indices 
des éléves Ss pated €leve patter eS Vevie d'erreurs 
Vont i 700 Ay) 02 
2 420 Nel 03 
continuer S 443 18 04 
4 360 22 06 
leurs 5 LM ANS) oe Q7 
6 300 st 10 
études Vv BET oe 10 
8 Suess 67 IZ 
secondaires 9. 460 60 1 Bs: 
10 256 3D 13 
a 436 Dal, 1.3 
LZ 641 86 i) 
13 oad Ley) 14 
hota oul BYaAl 0S) 
Vont directe- a 400 24 6 (ONE 
ment dans une : 
Lise e Lon 2 740 Die ~08 
post-secondaire 
Total abd hana) 82 Oy 
Vont if 176 22 13 
Me 344 pee 16 
chercher 3 436 85 A, 
4 Sieh | 87 22 
du travail 5 Sled dye 27. 
6 348 595 Pah 
bows d: TEEN 16S) ae 
N'ont pas ft) 457 56 She 
d'idée Zz 168 Ze ee 
DEecise 3 D924 Re) gS 
Total 1TE4a9 161 ee 


/ nnn a ae ee Erne 


TRBEER UD eez 


Indices d'erreurs des élaves de 13éme année en fonction de 
Feurss intent rome professionnelles et educatives 


hovtAdusde Gaal otal d'erreurs 


Intentions mots écrits commises Indices 
de seve levie Ss par l'éléve par l'éléve d'erreurs 
ui OD 8 Ol 
Vont 2 449 20 04 
3 DDD 30 05 
dans 4 Dy 28 06 
» 324 24 Oa, 
une 6 ea 54 07 
7k BESS) 24 OPr/ 
Inst FeUe Lon 8 615 43 C7 
D 488 4] -08 
post- 10 274. 50 .08 
sak SEY) 29 Oe 
Seécaondaice 12 BE 42 Mh 
1S Ay 49 12 
14 502 62 e12 
ND) Z69 38 1 
16 Dow Ue Te 
le 386 57 1s) 
18 Loe was Ay 
19 444 12> 17 
20 483 eel 20 
20 Lo 43 wed 
Total i Olea 1025 eo 
ee 
Vont refaare i 498 43 9 
une 1]13éme ze 405 69 eo, 
Total 903 ie ree 


ee 


Vont chercher 


du travail i) 616 a sili’ ~O02 
Se ys yee eae er ee ere, 
NSontapas 

d'idée it 406 IF aay 
précise Zou 49 aly, 
Total TERS 76 ae 
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TAB EE AUR Be 


Pourcentages d'erreurs d"usage pour 
six parties du discours fréquentes 


Esitalinarteatoin 


Nombre du nombre total Pourcentage 
Item d erreurs d'item dec peUurs 
Préposition 268 2700 10% 
Verbe 193 2700 70 
Substantif ihe 4800 : 70 
Article Wie) ZG 0 
Adjectif ies 1800 70 
Pronom 46 1800 70 


TABLEAU B.4 


Pourcentages d'erreurs de nombre pour 
cing parties du discours fréquentes 


Estimation 


Nombre du’ nombre total Pourcentage 
Item adler retins d'items d'erreurs 
Adjectif 106 1800 6 
Verbe 140 2700 5 
SUD Sit aint. fp iL 7 4800 5 
Pronom 
personnel 50 1800 D 
Neg ieole 55 Zz POG, al: 


Les différents types d'erreurs 
trouvéees dans les 16 redactions 


Type 

Usage lexical 
Orthographe 

Nombre 

Genre 

rersonme du Vverbe 
Article 

Erreurs de syntaxe 
Omission de ne 
Autres 


Erreurs non-analysables 


ota li 


TABLEAU TB => 


Nombre d'erreurs 


oa 


Zu. 


174 


154 


42 


ay 


ee 


ILA 


10 


50 


jded | 


ypoul 


Pourcentage 


de ire quence 
32% 
23% 
20% 


oO” 
70 


Oo 
70 


0/ 
70 


oO 
70 


oO 
70 


0” 
10 


oO 
70 


100% 


Theme 


Théme 


Theme 


Theme 


Theme 


Théme 


TABESAU SB .:6 


Repartition des 50 redactions en fonction du theme 


Théme Nombre d'éléves 
Theme 1 10 
Théme 2 5 
Théme 3 5 
Theme 4 10 
Théme 5 5 
Théme 6 5 
Théme 7 5 
Théme 8 5 


Les scénes de violence dans les livres, les films 
ou a la télévision doivent-elles étre censurées? 


Quelles sont les valeurs des Jeux Olympiques, non 
seulement pour les concurrents, mais plus par- 
ticuliérementepour les pays qu'ils représentent 
et pour le monde entier? 


Le progrés de la science est la garant du progrés 
dev la civilisation. 


Les €6écoles secondaires ne devraient pas demander 
que les étudiants suivent tel ou tel cours ou 
aient tel ou tel pré-requis; les étudiants 
devraient étre libres de choisir les cours qui 
correspondent le mieux & leurs intéréts. 


La compétition développe la force de caractére. 


Les années 70 sont les meilleures pour étre jeune 
au Canada. 


Une des caractéristiques de notre temps est notre 
réticence a participer; nous devenons de plus en 


plus une nation de spectateurs. 


Llartyimite=t—1) Va yie. ou dla vie imitest—a 1 
eV ane 
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TABLEAU” BST 


Repartition du sous-groupe de 16 éléves en fonction de 
leurs intentions et de l'année d'étude 


Vont continuer Wane dens Wm Vont 
leurs études établissement chercher N'ont pas 
secondaires post-secondaire iW) wien ei ll d'idée 
12éme 13éme 12éme 13éme 12éme 12&me 13éme 
1 7p iL 7 2 i) q 


IoD 


TABLEAU B.8 


Détail des différentes erreurs d'orthographe 


Erreurs attribuables aux conflits homonymiques 


lore all 


Doublets homonymiques 


6 TRACE 

ath aa 

CU secre at 
stesties Getest 
ces = ses 
Sais, Sa dct 
Celsuse 

(oHey oe KON GME 

sont 3; son 
dont. ss Gone 
aLe +; .es% 
Olina gO.U 
pres; prét 
quel te, 3; qu \ed ve 
SiO Fs aS Guat 
dlen * scans 
sans 3 s'en 
Sr Ae 

tant ; temps 


Nombre 


d'erreurs 
ZAG, 


eZ 


EPreurs, non=atecibuables ‘aux conf bite homonymiques 


Towa 
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TABEEAU,B AS 


Détail des erreurs»d*®usage lexical 


fypeed) erreur OTA gines dem lever rear Pourcentage 
Interference Non 
de l'anglais interférence 
Prépositions Sis 47 itrea) 
Verbes 26 45 LO? 
phe ceantite 16 oT | LOR 
Adjectifs 7 16 ‘ Deen 
Pronoms - 8 Cee, 
Adverbes - 6 aD 
pobre le = 1 0.4 


GLE'6 


“119°8 


000°8 


CUCL 


Sinayenjeng Sa] Jed Sognqizjjye SouUaAOW SajOU Saj ja SiNaiJa,p Sodipul Sa} d4jua UO!}e/94409 :1°g anbiydesig 


Sinajenjere sap sauuaAow sajoN 


688'°S 


855°9 


9b Y 8E0'e 


OSL'S CULE ClE'C 

OL0°0 
0200 
0€0°0 
Qv0'0 
0390°0 
090°0 
040°0 
080°0 
060'0 
001°0 
OLLO 
0210 
O€1°0 
Ov L'O 


a a aaa aaa aaa a eed a a a a a we se so 


e OSLO 
r 091°0 
OLLO 


® O6L'O 


1 
i] 
1 
t 
1 
1 
4 
i] 
I 
i] 
4 
t 
1 
1 
i 
1 
1 
t 
; e@ 00z'0 
i] 
1 
i] 
1 
! 
1 
I 
1 
1 
1 
I 
1 
! 
t 
1 
i) 
1 
1 
1 
i) 
! 


0cc 0 


° Ole 0 


SYN3AYYS.O SSOIGNI 
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Abototskugd - 

satskud - 

Ayudezboayg jTeatskyd - 
ABbojotsauty pue UotzedonpyA TearsdAyd - 
AydosoTttud - 

Aoewreudg - 

Aizawojdg - 

Bursany - 

SE8TPNYS SATIEN — 

OTSNW - 

ABoTOtgoIOtpW - 
ABDIN{Te®IWeNW - 

S8TPN4S TeASTPSW — 
SUTUTPaW — 

SaousToS qe] [TeotTpay - 
sotTjJewayyey - 

ABboTotg sutieW - 
SITISTHHUTT: = 

Sa0uUaTIG 34T] - 
ADOFGUUOE | AIC IgGT] - 
Ssaouatoas Tergaqrt] - 

Me] - 

S8TPN4S UeITI9sWy UT Je] - 
wST[Teuanoc - 

UTTER - 

SaTpnjs [TeuoT}Jeudaquy - 
uBbtsaqg [Tetai4snpuy - 
UGTIEIGSTUIMpY pooy pue [Te zoy 


ep) 


wn 


ep) 
(ae, (ee (Up) Se fel, Ca) PS Une Us ee Gel Ep) ook, BE fol, Wey ae aq) PE eb (ak ae fab (doliléph jae 


(aTqe, yo pua aas 


Sa8Tpnys sJsewnsuog 
agoauatosg iaynduwog 

GInIEISIT] SATIeIeduGD 

Sjiy uotzeatunuwoy) 

SeTpnys YyTeesmuouUWO 

sIIaWWO? 

(49819 pue uTj}e]) sotssey,y 
AboToseyoiy TeItsseTgy 

SS9EPNAS PTEYI 

AIj,stwayy 

setpngsc UeTpele, 

UOT}JeIYSTUTWPY SSaUTSN| 

Aue jog 

sotsAydotg 

ABboTotg 

Alystwaysotg 

AwOuo1i4sy 

S38Tpnys uetsy 

gIinqzoaytyoiy 

uoT}Je4ndwoj pue sotystzeqys pattddy 
UOTJITIINN UeUNY pat Tddy 

aoduatas yyeW pue Teuotyeyndwog pat{tddy 
AbojTodoiy4uy 

aouetasg ~Tewtuy 

Awojeuy 

S8TPN4S OWTYSQ UeTpUTISWY 
ainqpnotaby 

UOT}JEIYSTUTWPY 


‘Sopo7) UOTILITITSSe pg 104) 


sasinoj AqtsiaAtupn Burdytsse[gj 104 stseg 


J XIGN4dddV 


Ww Ww 
oannmnanmaandncaadannnnnwn a 


Sp) 
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Saauatoag pat{tddy 
SQ89USTIG [TetI0S pattddy 
S}iy pattddy «art 
SUOTSS3JjOIq 
SaaguaTtas 
SQ8QUSTIS [Terso0S 
Sqiy/setytuewny 


>SMOTTOJ S@ aie sapod JazjaT Hurpuodsaziio3 pue seaie weiboid ainojy auy{ 


eo eo OOOeee—O ec EEE 


ABoToo7 

SUTITPaw ATBUTIAIAA 

SE8Tpnysg ueqif 

ABboToayy{ 

SOTT xe] 

@IUaTIG ABAING 

UOT}JIY TBTIOS UT SSTPNAS 
SOTFSTIEYS 

WOT IJEIFSTUTUpY “Ssqraode 
ustueds 

Ssatpnjs ueadoiny yseqZ pue 4aTAOS 
ABoTotsa0s 

ATOM TeTIOS 

weibozig AdJoay], Tetrao0s 
S38TPNIG AFABTS 

Teiauag ‘aauatas 

Satpnys snotbtTay/uotbrttay 
SUTOTPaW SATPERTTTGeysay 
ButuuetTg TeyJUuswuoITAUW TeuoThay 
SaTpnjys TeuoTyeadaay 
UOTJeIFSTUTWPY ITTqQnNd 
AboToyaksq 

SOT YST}JeYS pue satytTTqeqold 
ssUeTUS Te9tyt Tod 

Q9Q9UaTISG 4UeTd 


cp) nw mw Se) 
ia) Cor Cel Ue) (ak, fol, ak jak Ae ep) (Se) (a) (ak (8p) (Up) ae, fat, Wp) Wp) fal, (aL fal, fel, (alk, (Sp) 


wo 


SOTWOUODW awoy 

AIOYSTH 

YFTE9H 

uewlag 

AboToa9g 

Audeiboag 

SoT}auayg 

Se epnas pue Oot e psuery ebendupy . a4 
youad 4 

AIjsaio4 

ABOTOTG SJSTTPTIM pue satieayst 4 
Fty/sysy aut4 

wtt4 

Satpnys AjTtwe 4 

SOUaTIG TEJUAWUOITAUY 
ystpTbuq 

quawabeuew pue Butiaautbuy 
Butiaautbduq 

uot zeanpy 

SOTWOUOOW 

SQ90USTOS YyqIeg 

ewelg 

Sa8Tpnys JUuseudoTadagq 
AIjYSTjuUaq 

auatbAyH TejUuag 


CH ves 


2370N 


1 
cp) 


| ! 
ep) ep) 


jae (oe (ep ae Up Cap et, jek, fl, 25 Cr) Up) Se ae fe, (el Be fos a) ep) Ep) Sey fel, (Wpy fol. 


| 
Ww 


! 
w 
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APPENDIX D1 
EQUATING SCORES ON BOTH FORMS OF THE TEST OF READING 
COMPREHENSION AND LANGUAGE ACHIEVEMENT (ENGLISH) AND 
THE TEST DE COMPREHENSION EN LECTURE ET DE CONNAISSANCE 


DE LA LANGUE (FRANCAIS) 


1. PROCEDURE 
The rationale for this equating procedure was proposed by Lord (1950). 


The procedure itself is described by Angoff (1971). 


iy, a Required Data 


Each of a group of examinees takes both forms of the test. The following 


subscripts are used: 


i = yee ii examinees 
j = a,b groups of examinees 
ke 152 test forms 


The n, examinees in group "a"! take the forms in the order’ Form ais 


Form 2; the ny, examinees in group "b'' take the forms in the order 


Form 2, Form 1. The scores are denoted as: aaa = mark achieved on 


form k by examinee i of group j. 
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ibs Equations 


The basic equation 1s: 


where 


My is the mean of form k 


Sy) is the standard deviation of form k 


and these parameters are estimated as follows: 


M, = 5 (Mo + My - C)) 
Dae Ss 2 2s. 
om Bech ea 
C 
k 
Hae nett a : My Ne 
S] 7) 


The remaining undefined terms aries and Tekh are the reliability estimates 
of form k from the scores obtained by groups a and b respectively. 

Given estimates of the parameters in the basic equation, scores 
from one form of the test can be taken into the metric of the other form 


of the test. 


2. APPLICATION 


The data for equating scores on the forms of the Test of Reading 


Comprehension and Language Achievement (English) are presented in 
Table D1l.1. The corresponding data for the Test de compréhension en 


lecture et de connaissance de la langue (frangais) are given in 
Table Dl.2. 
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It was decided, arbitrarily, to equate scores to the form of the 
test having the higher mean. For the English test, Form 2 scores were 
taken into the metric of Form 1. For the frangais test, Form 1 scores 


were taken into the metric of Form 2. 


2.1 Equations for the Anglophone Test 


(1) Reading Comprehension Subtest: 
X, = Laloxs + 0.30 
(ii) First Language Achievement Subtest: 
Xy = 1,03X5 - 0.26 
(iii) Second Language Achievement Subtest: 
X, = Q197Xo + 1.58 
Ci~jetTotal Test: 


X; = 1.01X_ + 1.69 


Dae, Equations for the Francophone Test 


(i) Reading Comprehension Subtest: 
X> = 1.02X) + 0.99 
(ii) First Language Achievement Subtest: 
X> = OE XT O29 
(iii) Second Language Achievement Subtest: 
X> 2 et 
(iv)* Total” Test: 
ro = leOSkjarts col2 
In the foregoing equations, the subscripts 1 and 2 "eter to™test form, 


X refers to test score, and the symbol ''*" over X designates the fact 


that this is a score estimated from the student's observed score on 


the other form. 
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FABLES Dales 


Data for Equating Both Forms of the Test of Reading 
Comprehension and Language Achievement (English) 


Group ee Group i 

Test Part Form Statisitac (N=1612) (N=1619) 
Mean AS66 4.67 
Reading 1 S. dp. Pheaem | Palas) 
Comprehension rox OG” 0.68 
Subtest Mean (gO bee ke) 
2 Sede. Zao Zine 
Cx psy) 0.60 
Mean UA 8) “A, 18 
Pa Est a sad. 23995 278 
Language DX 0.66 0.64 
Achievement Mean SLA ys Gey? 
Subtest ze S dis 2.78 20 
rox OF 6r/ Oo 
Mean TeeGe Tek) 
Second i Sadi, a. oD Dys! Zar, 
Language PXx O26] Ore 
Achievement Mean 622.8 62.02 
Subtest 2 Srichs De 45 5740 
TX.X Oxf 0.61 
Mean abe, 9 L744 
Ott I Sted. ghd ites 
Test r Xx O.83 O88 
Mean 15364 L445 
2 s.d VEEN Phas 3) 
rxx O.< 8.3 Onis 


“Group letook the forms in the order Form iysaptieny-pnGroup 
2 took them in reverse order. 


Drxx synbolazes: Ubhe Jcocticient ‘of pelaabi lutyvemestimabed 


using Kuder-Richardson formula 20 (Lord, & Novick, 1968, 
rayaed dale 
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TABLE D1.2 


Equating Data for Both Forms of the Test de compréhension 
cimvectuine et cer connarteoance Ce ar angue (francais) 


Group cha Group FF 

lest Part Form ate masitene (N=248 ) (N=249) 
Mean Deez. 5.09 

Reading H S.d. 75a ob) 3.08 
Comprehension rl xx Ono 0.66 
Subtest Mean wa: 6501 
2 S.idi« Die 5ROU 

rXxx Sane ys 0.64 

Mean TE SANG Lee 

First le SG Ze oe 2.10 
Language PRX OZ Ue 
Achievement Mean (gee Sy} (ea 5) 
Subtest 2. s ids LeOD 2.66 
ExXx G..04 0.94 

Mean ee A Zeta 

Second I S.0dis pie D866 
Language rx x Ws 2 0.59 
Achievement Mean eee | ET ARIS) 
Subtest Y Ss 218 2.80 
rx x 0.66 0.68 

Mean ice. here! 
Total 1 s.d Gal 6.66 
Test rT xx 0. /6 Oe76 
Mean P2256 ee eS 

2 s.d 6.63 10 

rx x ey 07.6.0 


“Group I took the farmer unethe order Form l=rorm 23 Croup 2 
took them in-reverse order. 


Dixx symbolizes "the coefficient of reliability, estimated 


using Kuder-Richardson formula 20 CLord &« Novick, 1968, 
ps OL). 
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APPENDIX D2 


PROCEDURE FOR SCORING ESSAYS 


ANALYSIS OF THE CALIBRATION SAMPLE 
Data. Each of m scorers marked each of n essays. The 
following indexes will be used: 
2 INN Ae hs i essays 
pF= 5)... ,mescorersi- 
The marks will be denoted as: 
vs = mark given by scorer p to essay i. 
Model. It will be assumed that there is a single common 
characteristic being measured. The scorers vary in their mean levels 
of scoring, their scales, and their reliabilities. The fundamental 


equation is: 


Vem = ath tae. 
ip pee 
where 
a =-latent, common characteristic for essay i, 
7 = mean score level for scorer p, 
Ne = scaling factor for scorer p, 
Ge. =e errors 
1p 


The error terms are assumed to be independent within and across scorers 
and essays. The errors are distributed normally and homogeneously by 
SCOPE a. 

Cae Ven COa Von) 

ip Pp 
where 


My = error (unique) variance for scorer p. 
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Identification. To define a unique solution, we will fix the 


mean and variance of the t,. 


ECC) 0 


V(t) 1 


(This may later be changed to any desired scale.) 


Estimation. Using the matrix y = in) we perform a 
1-dimensional common factor analysis. This involves computing: 


mean mark for scorer p, 


< 
7 


variance of marks for scorer p, 


r__ = correlation of marks for scorers p and g, 


5 
Nm 
" 


estimated communality for scorer p, 


SS 
i} 


factor loading for scorer p, 


He unique variance for scorer p. 


Transformation. The factor analysis results can be 


translated into estimates of the original model parameters: 


“a 


aa 


a 

p Pp; 
b = o in 

Pp 1D JD 5 

v = s 2(1-h 2 
p ( p ) 
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SCORING THE DATA 
Data. Suppose that for a particular essay we have a 


(reduced) set of scorings: 


We want to estimate t for the essay. 
Model. Following the model for the calibration data, we 
assume that 
y. N(a_+b uy) hD oe Lei, ea, Me, 


p Pp Pp 
and that the "errors" are independent. 


Derivation. The log-likelihood of the observed set of scores 
is 
te yeast bet} 
Q = -- 2 * constant, 
aa NAT 


p 


We differentiate & with respect to t and set to zero: 


b (ica) 
ee Y ae = 0 
arn 


The maximum-likelihood solution for t is 


x SRE -a 
p Lak Oe p 
age SEROUS AE ae ea 
b + 
p 
ey eee 
Pay 
p 
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Substitution. For practical calculations, we substitute the 


estimates of the parameters: 


s_h 
zy —PP ____+ y -y ) 
Ps 2(1-h 2) p 
A p p 
CS 
s 27h 4 


where 


is the signal to noise ratio for scorer p and has the effect of 
emphasizing the more reliable scores. Note also that the reliabilities 


are weighting the estimates as follows: 


SO se has the effect of expanding the scores of the less reliable 


SCOrers. 
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IMPLEMENTATION 


The essay scoring procedure was applied to four sets of essay marks: 


those for the Writing Test (English); Test de composition écrite 
(frangais); Writing Exercise (anglais) - Score on the Summary; and 
Writing Exercise (anglais) - Score on the Commentary. The data required 
to implement the procedure include, for each marker, the mean and 


standard deviation of scores assigned to essays in the calibration 
sample and the factor loading of the marker on the first common factor 
of the matrix of intercorrelations among markers. (The correlations in 
this matrix are compiled using the essays in the calibration sample as 


observations.) 


The data used to implement the essay scoring procedure for each of 
the four sets of essay marks are reported in Table D2.1. Given these 
figures, the raw scores assigned to an essay, and the identification 
numbers of the individuals who marked the essay, the equations of the 
preceding section can be used to derive the score for the essay. This 
score can be regarded as a weighted average of raw scores, where the 
weights make the necessary adjustment to allow for differences in the 
mean, the variability and the consistency of the marks assigned by 


different markers. 


After all the essays had been scored, a final linear adjustment 
was made to ensure that the distribution of scores on the essays exactly 
covered the full scale of marks that the scorers had been told to use. 
In the cases of the Writing Test (English), the Test de composition 
écrite (frangais) and the Writing Exercise (anglais) - Score in the 
Commentary, this scale of marks was 1 to 10. The scale of marks on the 


Writing Exercise (anglais) - Score on the Summary ran from 0 to 10. 
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TABLE D2.1 


Data Required to Implement the Essay 
Scoring Procedure 


Sconer Mean SD Factor Loading 


Writing Test (English) 


1 5.61 1.86 0.72 
2 Be 25 Theale) 0.66 
3 5.86 1.64 0.81 
4 5.60 1.85 0.91 
5 Bo ae 2.43 O72 
6 5.60 SY 0.76 
7 4.91 1.83 0.70 
8 TGS. Dagh 0.66 
9 6,00 DOE 0.72 

10 Sib) 2.30 0.86 

1l 4.45 2.05 0.83 

yw 5.64 260 0.76 

13 Seal 2.46 0.68 

14 BG NE 0.65 

ue 5.81 2258 0.80 

16 Bou 1.78 0.81 

17 6.42 el ere 

18 5.92 2.00 0.68 

19 6.02 2.48 Qons 

20 4.69 eal 0.79 

21 base 2.50 0.87 

22 4.98 Le 0.60 

ae ary? 1.94 0.59 

24 Bes Die) 0.88 

25 aa DBE 0.67 

26 is ral O17g 

27 5.75 7780 0.54 

28 a? 2.29 0.74 

29 Beal ZL 0.76 

30 5.98 1.91 0.72 

Ail Do Cass Bean 

6.26 aca 0.86 

33 Dade 1.90 0.85 

34 6.25 Teh Daz? 

35 5.60 2BAOS 0.74 

36 6.19 2.64 0.76 
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TABLE SDZ sl @Geontinued) 


Seorer Mean Seis Factor Loading 


Composition écrite (francais) 


1 DG Ati) OG 
2 Dew hell Sega Bais! 
3 Din Jt) ONE Wi8) 
4 6.20 es 7. 0286 
S; 6.20 ZelhZ O83 
6 62,50 lee Gik O89 
7 66> Ze29 0.68 
8 pee G Nh edly OZ 
g 62018 eet 16: Ones 
Writing Exercise (anglais) - Score on Summary 
1 SS ees | 0.46 
Z Disco 2 oO OB 
5 PSM AMS) VS O. 7-0 
4 AN 5) Hens he) ee ot 
5 oR oAG) SNS OAIt 
6 Beals AS Toms] 0.88 
Writingeexerelse Canglais,) = Scoreson Commentary 
M Pret 2529 0.84 
Z DPR ot) oO O74 
3 2205 asd Orenr2 
4 rent) DDD, SES Ps 
A DiiOw, Licked O29 1 
6 Ue MAS) Tapeh tage es 


APPENDIX E 
STUDENT QUESTIONNAIRE 


Instructions 


In order for this study te be carried out properly, it is 
important for us to have certain information about you, your 
family, and your plans for the future, This information will 
be kept completely confidential. It will be read only by the 
person responsible for computer coding your responses. When 
the information has been coded and stored in the computer, 
it will be identified with a code number. The list which 
matches code numbers with individual students will be 
accessible to only a small handful of researchers. 


There are some special instructions for completing the 
questionnaire. Please read these instructions carefully 
before you begin. 


1. NUMERICAL ANSWERS. In some cases you will be asked to 
give your answer in numbers in the right-hand column of the 
page. If you are giving an answer in numbers, please use 
enough diaits in your answer to fill all the spaces 
provided. This will mean that in some cases vou’ should put 
zeras before your answer. For example, if your answer is 
>, sands twoerspaces: sre provided,s-your yshould) write, your 
answer as "03". 


Zs. VERBAL ANSWERS. In other cases, you will be asked to 
Give your answer in words in a space provided at the left 
side of the page. You will notice that in the right-hand 
column opposite each of these spaces there are parentheses. 
DO NOT. WRITE ANYTHING:IN, THESE PARENTHESES. They are for use 
in translating your verbal answer into a number to be coded. 


Please try to answer all questions as completely = and 
accurately as possible. Any omitted or unclear answers 
aniect, the accuracy of. thesstudy results. 


Thank you for your help. 
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Student's Name 


SECTION 1: PERSONAL INFORMATION 
JA. What is your date of birth? | Day (2-digit number: 01, 
O25 pt.) 


Month (2-digit number: 
Janv= 01, Feb =102) ,.2) 


Year 


1B. Write the following number: 1 if you are female 
2 if you are male 


SECTION 2: LANGUAGE 


2A. What language do your parents or guardians usually speak in 
your home? Write the number corresponding to this language 


in the list below. 


l: French 
fo Ele Pon 
3: Other 


If you answered "Other" by writing the number 3, please 
write the name of the language here: 


2B. What language do you usually speak outside school and home, 
with your friends and neighbours? Write the number corres- 
ponding to this language in the list below. 


1: French 
2: English 
oo: Other = 


If you answered "Other" by writing the number 3, please 
write the name of the language here: 


Questions 2C, 2D, and 2E refer to full-time regular school 


attendance, from kindergarten to your present grade. Do not 
include information about special schools attended outside _ 
regular school hours (e.g., on Saturdays, or on weekday after- 
noons after your regular school day). 


2C. How many years (counting the present year, if applica- 
ble) have you spent in schools in which most or all 
of your classes were taught in French? (Give as a 
2-digit number: 00, 01, 02, ...) 


2D. How many years (counting the present year, if applica- 
ble) have you spent in schools in which most or all 
of your classes were taught in English? (Give as a 
2-digit number: 00, Ol, 02, ...) 
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Zi. 


How many years have you spent in schools in which most 
or all of your classes were taught in a language other 


than French or English? (Give as a 2-digit number: 


OO + Ole 2 teerers) 


Write the name(s) of the.language(s) here: 
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SECTION 3: COUNTRY OF BIRTH 


ame In what country were you born? Write the number corres- 
ponding to the country in the list below. 


Canada 
United States 
Great Britain 
France 
Other 


ONARWDND 


If you answered "Other" by writing the number 5, please 
write the name of the country here: 


ANSWER QUESTION 3B ONLY IF YOU WERE NOT BORN IN CANADA. iF 
YOU WERE BORN IN CANADA, LEAVE QUESTION 3B BLANK AND GO 
DIRECTLY TO SECTION 4. 


3B. In what year did you come to Canada? 


In the next three sections of the questionnaire (Sections 4, 
5S, and 6), you will be asked for information about your par- 
ents and/or guardians. We realize that in many cases the 
adults who are responsible for you are people other than your 
natural parents. Therefore, each of these questions has three 
parts: 


a) Information about your natural parents. If you are able 
to, please give this information whether or not you act- 


ually live with one or both natural parents. 


b) Information about your adoptive parents. If you have 


been legally adopted, please give this information 
whether or not you actually live with one or both 


adoptive parents. 


c) Information about your guardians. Please give this in- 
formation if your present male or female guardian is 
Someone other than a natural or adoptive parent. For 
example, you may have a stepfather or stepmother, or 
may live with grandparents or in a foster home. 


In these sections, leave an answer space blank only if the 
particular question is not applicable to your situation. For 
example, if you have an adoptive father, do not leave blank 
spaces for the questions relating to adoptive fathers, even 
if you do not have the required information. Instead, fill 
in the code number which matches the answer "Unknown". 
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SECTION 4: PARENTS' AND/OR GUARDIANS' COUNTRIES OF BIRTH 


BEFORE ANSWERING THE QUESTIONS IN THIS SECTION, PLEASE REFER 
TO THE INSTRUCTIONS ON THE BACK COVER FOR GUIDANCE. 


ANSWER THESE QUESTIONS BY WRITING THE NUMBER CORRESPONDING 
TO THE CORRECT -ANSWER  IN- THIS LIS? 


4A, 


4B. 


4c, 


4D. 


Canada 

United States 
Great Britain 
France 

Other 

Unknown 


DOWN 


-In what country was your natural father born? 


If you answered "Other" by writing the number 5, please 
write the name of the country here: 


In what country was your natural mother born? 


If you answered "Other" by writing the number 5, please 
write the name of the country here: 


(Answer if you have an adoptive father) In what country 
was your adoptive father born? 


If you answered "Other" by writing the number 5, please 
write the name of the country here: 


(Answer if you have an adoptive mother) In what country 
was your adoptive mother born? 


If you answered "Other" by writing the number 5, please 
write the name of the country here: 


4E. 


4F, 


(Answer if your present male guardian is someone other 
than your natural or adoptive father) In what country 


was your present male guardian born? 


If you answered "Other" by writing the number 5, please 
write the name of the country here. 


(Answer if your present female guardian is someone other 
than your natural or adoptive mother) In what country 


was your present female guardian born? 


If you answered "Other" by writing the number 5, please 
write the name of the country here: 
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SECTION 5: EDUCATION OF PARENTS AND/OR GUARDIANS 


BEFORE ANSWERING THE QUESTIONS IN THIS SECTION, PLEASE REFER 
TO THE INSTRUCTIONS ON THE BACK COVER FOR GUIDANCE. 


ANSWER THESE QUESTIONS BY WRITING THE NUMBER CORRESPONDING 

TO THE CORRECT ANSWER IN THIS LIST. IF YOU ARE UNCERTAIN OF 
THE CORRECT ANSWER, BUT HAVE SOME REASONABLY GOOD IDEA, 
PLEASE PICK THE ANSWER YOU BELIEVE MOST LIKELY. ANSWER 
"UNKNOWN'' ONLY IF YOU HAVE NO KNOWLEDGE AT ALL OF THE CORRECT 
ANSWER. 


1: Unknown. 

2: Did not complete elementary school. 

3: Completed elementary school, but did not continue to 
secondary school. . 

4: Attended secondary school, but did not graduate. 

5: Graduated from secondary school, but did not continue 
to a postsecondary educational institution (e.g., uni- 
versity, community college, art college, agricultural 
college). 

6: Attended a postsecondary SEMIS REESE institution, but 
did not graduate. 

7: Graduated from a postsecondary educational institution. 


SA. Which statement most accurately describes the highest 
level of your natural father's education? 


5B, Which statement most accurately describes the highest 
level of your natural mother's education? 


SC. (Answer if you have an adoptive father) Which statement 
most accurately describes the highest level of your 
adoptive father's education? 


5D. (Answer if you have an adoptive mother) Which statement 
most accurately describes the highest level of your 
adoptive mother's education? 


SE. (Answer if your present male guardian is someone other 


than your natural or adoptive father) Which statement 
most accurately describes the highest level of your 


present male guardian's education? 


SF. (Answer if your present female guardian is someone other 


than your natural or adoptive mother) Which statement 
most accurately describes the highest level of your 


present female guardian's education? 


424 


SECTION 6: OCCUPATIONS OF PARENTS AND/OR GUARDIANS 


BEFORE ANSWERING THE QUESTIONS IN THIS SECTION, PLEASE REFER 
TO THE INSTRUCTIONS ON THE BACK COVER FOR GUIDANCE. 


You will be asked below to give the occupations of your par- 
ents and/or guardians. Please read these instructions care- 
fully before answering. 


6A. 


6B. 


6C. 


If you do not know the occupation of one of your 
parents or guardians, write the word "unknown", 


If possible, give a two-word description of the 

occupation--e.g., "bus driver", "industrial acc- 
ountant", "telephone operator", "dairy farmer", 

"university student"’. 


Try to make your description as specific as possibie. 
For example, an answer of "restaurant business" is 
much less imformative than one of "restaurant waiter" 
or "restaurant owner". You need not include the name 
of the employer. For example, write "sales clerk" 
rather than "Eaton's clerk". 


If one of your parents or guardians has more than one 
occupation, list only the occupation at which he or 
she spends the most time. 


If the main occupation of one of your parents or 
guardians is looking after the home, list his or 
her occupation as "homemaker". 


If one of your parents or guardians is deceased, 
retired, or unemployed, write "deceased", "retired", 
or "unemployed", followed by the name of his or her 


most recent occupation. 


What is the occupation of your natural father? 


What is the occupation of your natural nother? 
pe 
(Answer if you have an adoptive father) What is 


the occupation of your adoptive father? 


= anal 
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6D. 


Obs 


GE: 


(Answer if you have an adoptive mother) What is the 
occupation of your adoptive mother? 


(Answer if your present male guardian is someone other 


than your natural or adoptive father) What is the 


occupation of your present male guardian? 


(Answer if your present female guardian is someone other 


than your natural or adoptive mother) What is the 


occupation of your present female guardian? 
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SECTION 7: EDUCATIONAL PLANS (1) 


The term "postsecondary educational institution" refers to 
any institution normally attended after leaving secondary 
school - for example, a university, a community college, 
an art college, or an agricultural college. 


7A. Suppose that in Ontario all postsecondary educa~ 
tional programs now available in English were also 
available in French. In what way, if any, would 
this affect your present plans for postsecondary 
education? Choose the most appropriate statement 
below, and write the corresponding number in the 
space. 


1: There would be no change in my plans. 

2: I do not now plan to pursue postsecondary 
education. I would do so if an appropriate 
course were available in French. 

3: I now plan to pursue postsecondary education 
in English. I would pursue the same course 
in French if it were available. 

4: I now plan to pursue postsecondary education 
in English. I would pursue a different course 
in French if it were available. 

5: I now plan to pursue postsecondary education 


in French. I would pursue a different course 
in French if it were available. 


7B. Beginning in September 1976, which of the following 
do you expect to be doing? Write the number that 
corresponds to your answer in the list. 


1: Attending secondary school. 


2: Attending a postsecondary educational institu- 
tion. 


3: Working full time. 
A eOEne Tr. 


If you answered "Other" be writing the number 4, please 
explain here what you expect to be doing: 
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IF YOU ANSWERED QUESTION 7B BY STATING THAT YOUR PLANS FOR 
SEPTEMBER 1976 ARE "ATTENDING SECONDARY SCHOOL", GO ON TO 


SECTION 8. 


IF YOU ANSWERED QUESTION 7B BY STATING THAT YOUR PLANS FOR 
SEPTEMBER 1976 ARE "ATTENDING A POSTSECONDARY EDUCATIONAL 
INSTITUTION", GO ON TO SECTION 9. 


IF YOU ANSWERED QUESTION 7B BY STATING THAT YOUR PLANS FOR 
SEPTEMBER 1976 ARE "WORKING FULL TIME" OR "OTHER", GO ON TO 
SECTION (105 
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SECTION 8: EDUCATIONAL PLANS (II) 


ANSWER THE QUESTIONS IN THIS SECTION ONLY IF YOU ANSWERED 
QUESTION 7B BY STATING THAT YOUR PLANS FOR SEPTEMBER 1976 
ARE "ATTENDING SECONDARY SCHOOL". 


8A. Do you expect to begin attending a postsecondary 
educational institution at some time before the 
endot 1979% Write’ PP ftor “yes: 2 ‘for no". 


SB. (Answer if your answer to Question 8A was "yes") . 
About how many years do you expect to spend 
studying at the postsecondary level? (Give as 
pe 2-digit number: 01, 02, ...) 


GOTON TO SECTION 11. 
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SECTION 9: EDUCATIONAL PLANS (IIT) 


ANSWER THE QUESTIONS IN THIS SECTION ONLY IF YOU ANSWERED 
QUESTION 7B BY STATING THAT YOUR PLANS FOR SEPTEMBER 1976 
ARE "ATTENDING A POSTSECONDARY EDUCATIONAL INSTITUTION". 


9A. About how many years do you expect to spend 
studying at the postsecondary level? (Give 
as ja 2-di sit numbers. Ole l02 ara) 


9B. Have you applied for 1976 admission to one or 
more postsecondary educational institutions? 
Write, 1 fore!yes 52 forme now: 


9C. (Answer if your answer to Question 9B was "yes"') 
List below, in order of your preference, the in- 
stitutions and programs to which you have applied 
for 1976 admission. For each, please give the 
following information: 


Name of institution (e.g., University of 
Toronto, Cambrian College) 


Program applied for (e.g., Biology, Engineer- 
ing, Early Childhood Education) 


Language in which the program is taught 
(French on English) 


Degree, diploma, ‘certificate, ete.) which. you 
hope to attain (é:o. 9B A, Diploma in Agricul 
ture) | 


ie) instictution 


Program 


Language 
a ee ee ee 


Degree, etc. 
2s Institution 


———— ee 


Program 
ee ee ee 


Language 
ee 


Degree, -eCc. 


(list continues on next page) 
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Se) Institution 


Ss ee _ rraownwmWnm 


Program 


ee eeeeSSeeSSeSeSSSFSFSFSS CO row 


Language 


——— SS CC--OOCO}TE 


Degree, GEC: 


EEE — 


4-— Institution 


Program 


— ee 


Language 


Degree, etc. 


S. Institution 


TE 


Program 


i 
Language 


a 


Degree, etc. 


9D. Unless you give permission for their release, your marks 
on any tests you write for this project will be given 
only to you. However, you may choose to have your marks 
given to your postsecondary institution for their re- 
search purposes, for their use in counselling you, or 
for both. If these marks are released to such an insti- 
tution with your consent, they will not be used in 
making any decision about your admission to the institu- 
tion; in fact, in most cases the institution would not 
receive the marks until after a decision on your admis- 
sion had already been made. 


Are you willing to have your score(s) on any test(s) 

you write for this project made available to your 
postsecondary institution: 

a) for research purposes? Write 1 for "yes", 2 for no. 


b) for use in counselling you? Write 1 for ives 2 -POr, nO’. 


GO ON TO SECTION 11. 
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SECTION 10: . EMPLOYMENT PLANS 


ANSWER THE QUESTIONS IN THIS SECTION ONLY IF YOU ANSWERED 
QUESTION 7B BY STATING THAT YOUR PLANS FOR SEPTEMBER 1976 
ARE "WORKING FULL TIME". 


10A. What job do you expect to be working at in September 
1976? If possible, give a two-word description. 


10B. Do you have a job offer for 1976? Write 1 for "yes, 
ZPLOreu tro 


If you answered "yes" by writing the number 1, what 
is the job? If possible, give a two-word description, 
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SECTION 11: CAREER PLANS 


ATA, 


Have you made a decision about what your eventual career 
Maiiobe?, (Write Jofor “yes. .2 for 'no'; 


If you answered "'yes" by writing the number 1, what is 
the career? If possible, give a two-word description. 
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PABLE LESS 


Summary Statistics From the Regression 
Analyses of Marks in Mathematics and Courses 


Sua iE. 6 

Variable SiaDis B of B Ly N n 
SSHGD Analysis 
is Solan! Marks in 

Mathematics L276 
Zee hema 2e's 

Achievement Test Dos Wetiy Wo Oise cee Syl 
[PoC hoo Marks. aon 

Physics is Da 
ie rnysrTes 

Achievement Test Soe Wass W006 -CWseth B76 bz 
SSGD Analysis 
ls Seipowe ll WMeawwiks akin b 

Advanced Mathematics ESS) 
2. Vest of Arithmetic 

and. Basicr Algebra Gil ose (Oa OeGil —Giwe 4ue 
lis Soli@el Meyelkey sia < 

General Mathematics idl 336! 
2, Midas welminiene she 

and Basic Algebra 5) Iho OO » awe Oe UPB) sy 
Note: Stetistiess, 5.).,  B--the tfaw score | regression 
Soer tf Leen, 7 o.. Et of B, and r--the correlation 
Goerf tee nt), were derived from pooled within-school 


a 


Variation and covariation. N is the number of students, n 
the number of secondary schools. 


Average of marks in the calculus course and the functions 
and relations course. 


Op in. the Grade. 12 “toUngat tons,, mathemalLics<course. 


c 


Mark in the Grade 12 “applications" mathematics course. 


AD? 


IAB [Bai 
Means, Standard Deviations and Sample Sizes for 
Distribution of Achievement Test Scores and School Marks 
of SSHGD Students 


N 


Variable (3410)" “Mean =SeDs 
Test of Reading Comprehension and 
Lanquage Achievement (English) 
l. Reading Comprehension 

Parts '€(RC) 22 55 ee) 26 
2. First Language Achievement 

Psiesiee (uh) 2 4.4 Vases) 
3. Second Language Achievement 

Pets EGE) P25 Son 522 
4) Votale Teste vcr Ay) Wap Rays: LS 120 
Web nel eit CW) 569 Sees ies 
Tests of French as a Second Language 
I. Reading lest (FR) 360 hg ee 7.6 
2. Uistening@lést Crile 288 nish aeil 8.9 
Mathematics Achievement Test (MAT) 885 My? 6%-0) 
Physics Achievement Test (PAT) 401 15456 9.7 
School Marks 
ie Eno sh. Cr) 2678 68.0 hee 
2, Prencny Ce ) 690 PALS 14555 
5. Mathematies -OM) ZA TY: 68.6 TSS 
Ae Physics (Ce) 1168 68.4 hoes 


a 


his’ “was” “tne. “number “of e students drawn ine tne original 
sample. The difference between this number and the number 


Of students who took the Pest of Reading Comprehension 
and - Language Achievement “is “aecounted™ for By avemaln 


number of unusable test records and absenteeism on the 
dayraf tne test. 
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APPENDIX G 
TABEES OF STATIS VECSSREPERRED TE SEN THE 
SUBSECHLON: ON MARKING=STANDARDS (CHAPTER 3,..2..2)) 
OF THESSSGD/SSHGD, SURVEY 


OF FRANCOPHONE STUDENTS 
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APPENDIX H 


PABLES OF SIALISULCS REFERRED 0) UN 


REPORT 


GF RESULTS. FROM, GHe 


UNIVERSITY SURVEY 


CCHAPTER &, 


for 


IG. 


fa Ue 


Sree) 


Number 
Codes 
Universities 
Brock 
Earleton 
Guelph 
Lakehead 
Laurentian 
McMaster 
Ottawa 
Toronto 
Trerce 
Waterloo 


Windsor 
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Number 
OF 


Coummsies 


13 
12 
ist 


10 


Humanities/Arts Courses Completed by Students 
From Each University 


i 


yaa 1a il 


Frequency Distribution of the Number of 


JN 


Jw 


13 


40 


University 


ig 2 

3 

1 1 

1 

1 4 

toe ae 
2 1 


& 


They, 


25 


i 


Kee) 


|\o 


MS: 


42 


28 


a9 


TABLE H.1 Ceomtinued) 


Number University 

Olle 
ie 2 2 Se eG ee te 
Mein ease Te ee al 1d me Tele ate fia eae eee |) Ota 
hay) WSU Mis lsc) OA? teat Me a col? Vi ta7) ASS lege wee 
nP Diese te) Bou GaAGo sy) Gi Ole Oo le ewe 
nc da eoe) eh Shy Showy 188 BY In 3m Geese 


Note: No distinction was made in this tabulation between 
courses in terms of duration (one term, two terms, etec.), 
Creait value, or hours of instructional time. 


“This number. was assigned to students who were shown to be 
enrolled in at least one Humanities/Arts. course but who 
failed to receive a mark im the course(s). 

trator of students enrolled in at least one Humanities/Arts 

Course. 


“Number of students enrolled in other courses but not 
enrolled in even one Humanities/Arts course. 
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EABISE. SH +2 


Frequency Distribution of the Number of 
social Science Courses Completed by 


Students From Each University 


Number University 
of 
Coubise sl Kia Mee Ba be eee Bi, Me 
2) 
8 1 
fs 2 
6 2 Zz ii 
5 1 17 5 1 7 
4 1 ae 5 1 5 is 
3 4 1D 22 1B3) oi: 8 56 
2 v) Sal Le 10 26 20 90 
1 1 oJ) 8 6 De 15 154 
ae 2 3 2 5 ris) 


Dee 


hes: 


1 


10 


PABLE HS 2. Ceontanued) 


Number University 

of 
Courses 1 2 3 4 5 6 7 8 2 10 a 
Mean Die mee Out ye ai eeee Om ereklay eC cde Ol ith teeta eae 
53.0 ee ibs Pen SOULS ONES et Pe lee fav adtus eee on te ce ten it 
nP 30 76 125 «64a 30 LD De ho Baie Be nie ta ee si) 
N° 11 3 ie 6 12 25 142 a9 19 


Note: No distinction was made in this tabulation between 
courses in terms of duration (one term, two terms, etc.), 
Pwedtt value, or hours of -instructional time. : 


“This number was assigned to students who were shown to be 
enrolled in’ at Peast one Social Science course, but’ who 
failed to receive a mark in the course(s). 

ONumber of students enrolled in at least one Social Science 

course. 


“Number of students enrolled in other courses but not taking 
even one Social Science course. 


Number 
of 


Courses 


ig 


RS) 


12 


Lak 


10 


ABIES Hie. 
Frequency Distribution of the Number of 


Science Courses Completed by Students 
From Each University 


University 


ULE ADR shee Staly  Cald PEA Itigiecal a oe eT) 
1 

Z 

7 

8 

Hi 6 

2 

5 PZ 

16 Ze LS 

» 24 i 1 1 1) 30 
ie y/ 15) 7 14 6 ee | it el: 
3 4 ee 4 2 20 si) 62 de/, 
Z WV? 26 Z 1 a hes 6 193 1 HZ 
Ams Hits Tet 5 3 BD ie hee -: Za 
9 6 3 v4 6 y 29, 1 Zi 
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TABLE H.3 (Continued ) 


Number University 

of 
Courses 1] 2 3 4 5 6 oh 8 9 10 A a 
Mean Tae | LHe DAR SS2I 2 NIBP See lS eC RI Ie aR 28 
ns yD 1 6 eee tli: AGG Gis Ce. oe, a ee ect nace meee 
nP 30 A ae a lf UGA ZT 84 7. -ONeZ 4 164 56 
N° 40 iy 5) ) 63 SOS 1} Bat) Vide’ 


Note: No distinction was made in this tabulation between 
eaurses in terms ‘of duration (one term, two terms, ete.), 
Crema: value, or hours of instructional time. 


“This number was assigned to students who were shown to be 
enrolled’ in at Feast one” Science course “but “who failed to 
receive a mark in the course(s). 


EO umber Che students ennolled in ak Veast one Scvence course 


c ; 
Number of students enrolled in other courses but not enrolled 
in even one Science course. 
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TAB Lee Head 


Frequency Distribution of the Number of 
Professional Courses Completed by 
students From Each University 


Number University 
of 
Courses lg (-2.0q: AA eye lc # 7 ee ogg) ord 
10 6 
9 | 5 y 
8 1 
1 5 
6 ih il 19 
5 2 1 6 3 it ie ue 
4 3 4 B 1 3 Bal; 6 
3 8 8 5 Ji 3) 2 34 17 
Zz ile 30 4 8 Zi I 40 28 
1 14 1B) 30 S 8 iby Ae Te stats) Z Ze? 
Os 1 3 1 5 eo se 5 
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TABLE H.4 (continued) 


Number University 
of 
Courses 1 2 y 4 5 6 di 8 a 10 Jt 


Mean ro ‘lt ae Re lass gue cs tg Fe ime ae ma | aT aie a Fowl oes) TON rok mane | ng 


ie cia lk Ba eS he se eae tl Del eee 
nP 14 eal ie) Towel od BS. 974 eS) eae 
n° We 26 5] 8 9 B6 Bs wee NT 13 69 DS 


Note: No distinction was made in this tabulation between 
courses in terms of duration (one term, two terms, etc.), 
credit value, or hours of instructional time. 


@This number was assigned to students who were shown to be 
enrolled in at least one Professional course but who failed 
to receive a mark in the course(s). 


umber Gilt students enrolled in at least one Professional 
course. 


c P 
Number of students’) enrolled im other courses but not 
enrolled in even cne Professional course. 
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TABLE HS 
Frequency D¥st tiburon ot sier Nummer on 


Humanities/Arts Courses Completed by 
Students From Each Program Area 


Program Area 


Number 
of Humanities/ Socal 
Courses Arts science Science Professional 
i) 1 
iz 
iba 
10 2 
9 1 
8 4 
7 dat 
6 2 
5 36 5) 
4 pe) 10 v4 2 
3 65 Nay, Bs 3 
2 29 D7 oa Ue8 
i Z 135 138 46 
os ie WT 5 


TABLE H.5 (continued) 


Program Area 


Number 
of Humanities/ Social 
Courses Apis Science Science Professional 
Mean 3 atl tb a7 ee Ig 
Sa) leas ORE, 0.6 Oras 
ne 231 255 187 138 
Nie a2 284 79 


Note: No distinction was made in this tabulation between 
Courses in terms of ‘duration (one term, two terms, etc.) , 
ereaqive value, or hours oF instructional time’. 


“This number was assigned to students who were shown to be 
enrolled in at least one Humanities/Arts course but who 
failed to receive a mark in the course(s). 

Danian ola students enrolled in ate least one 

Humanities/Arts course. 


“Number of students enrolled in other courses but not 
enrolled in even one Humanities/Arts course. 


ery 


TABLE vH 6 
Frequency Distribution of the Number of 


Social Science Courses Completed by 
Students From Each Program Area 


Program Area 


Number 
of Humanities/ Social 
Courses Arts Science Science Professtonal 
9 4 
8 5 
7 2) 
6 ZT 
5 il 44 4 2 
4 a 69 Bi 6 
3 20 Any, be) 18 
Z 65 7 13 49 
i. 75 2a 139 72 
o% 9 i? 20 6 
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TABLE Hi Go contanued) 


Program Area 


Number 
of Humanities/ Social 
Courses Nets Science Serence Professional 
Mean Aw Sie) S ih ete ie 7 
SD OR Ik 6 (é 1 2 ihe (8, 
N° Wiz 387 Ha 153 
ne 54 170 64 


Note: No distinction was made in this tabulation between 
eaurses in terms of' duration (one term, two terms, ete.), 
credi: value, or “hours tof) instructional time. 


“This number was assigned students who were enrolled in at 
least one Social Science course but who failed to recieve 
a mark tin the course(s). 


OStudents were classified into program areas on the basis 
of the courses in which they were enrolled, not on the 
bascica of courses completed. 


SNumber of students enrolled Slap lie lG2se One Sowell 
Science course. 

ONumbet of students enrolled in other courses’ but not 

Sino lledmUinmeavenn OMe oOCHiol so Cle neenic OU Ser 
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PABLE CH. 
Frequency Distrabutsonsat theaNumber of 


Science Courses Completed by 
Students in Each Program Area 


Program Area 


Number 
of Humanities/ Social 

Courses Arts science science Professional 
14 ih 
iS) 2 
ty 
dal 7 
10 8 
9 7 
8 5 
7 ie 
6 SL Za 
5 iL 59 20 
4 5 a7 ine 
3 1 10 106 Zu 
2 2 46 66 36 
1 61 12.0 28 61 
0° 14 723 ia? 15 


TABLE H.7 (Ccont inued ) 


Program Area 


Number 
of Humanities/ Social 
Courses Arts Science Science Professional 
Mean ile 8 eee CG DAS 
Sib OraG Ore 223 gees 
n° 88 203 Gat 184 
no 143 184 53 


Note: No distinction was made in this tabulation between 
Eourses in terms of duration (one term, two terms, etec.), 
eredit walue, or hbours in instructional time. 


“This number was assigned students who were shown to be 
enrolled in at least one Science course but who failed to 
receive a mark in the course(s). 

Ostudents were classified into programs on the basis of 

the courses in which they were enrolled, not on the basis 

of courses completed. 


Cc : : 
Number of students enrolled iin “elie least one Science 
course. 


nee of students enrolled im @cner CoOmirSseSs OU fore 


enrolled in even one Science course. 


TABLE H.8 
Frequency Distribution of the Number of 


Professional Courses Completed by 
Students in Each Program Area 


Program Area 


Number 
of Humanities/ SOC ia | 
Courses Arts Science Science Professional 
10 6 
y 14 
8 2 
Hh 7 
6 3 2 > 
5 1 4 Wed 36 
4 > 14 fal 26 
b} 4 18 py 38 
2 Ba 34 Onl 43 
1 ZL 8 3 £0 13 
ov 4 19 28 ve 
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TABLE H.8- Coontinued) 


Program Area 


Number 
of Humanities/ Social 
Courses Aris Science Science Professional 
Mean olen Wag il fe (i al 
Sh5 LD eZ ope le Ss 72 th 
n° 47 172 311 vaaley| 
n¢ 184 215 160 


Note: No distinction was made in the tabulation between 
esureces. im terms of duration (one term, two cerms, etc.), 
Broo value, or hours of instructional time. . 


“This number was assigned students who were enrolled in at 
least one Professional course but who failed to receive a 
Marie im the course(s). 


Students were classified into program areas on the basis 


of the courses in which they were enrolled, not on the 
basis of courses completed. 


Number of students enrolled in at least one Professional 
course. 

On er of students enrolled if Gheaee®  COUCSEs ldtie more 
enrolled in even one Professional course. 
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Means 


WA BME. le 


and Standard Deviations of ‘the Distributions of 
School Means on the Varbablies Bncliided in the SoGDRocudy 


of Factors Related to School Achievement 


(Number of Schools 


Variable 


Residuals (Observed Mean - Predicted Mean) 
1. Test of Reading Comprehension and 
Language Achievement (English) 
ae 


OS 


4 


Wreuing Test 


hesteof “Aritinmetic 
and Basic Algebra 


AuiGti@e se lains 


Predicted Means 


Ho ese or Reacdan Comprehension and 
Language Achievement (English) 
6x 


ie 


8. 


Writing Lest 


fee or /Niewieinneie ec 
Aina Beil Algebra 


Future Plans 


Observed Means 


9. Test of Reading Comprehension and 
Language Achievement (English) 
IO. 


ie. 


Ze 
(oe 
NAY 


ie 


Writing Test 


Veste or MicivclamMmec Te 
AiG! BASE Algebra 


Aueuce: Plains 
Age 
Sex 


Language Spoken in the Home 
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Means and Standard Deviations of the Distributions of 
School Means on the Variables Included in the SSHGD Study 
of Factors Related to School Achievement 
(Number of Schools = 52) 


Variable Mean SD 


Residuals (Obtained Mean-Predicted Mean) 


hi Pee or Reading Comprehension and 
Language Achievement (English) 


Ze \pPuneuiee Pilhame 0.00 0.08 


Predicted Means 


Dm ReSheanOits Reading Comprehension and 


Language Achievement (English) L758 [37 
a. lFicwine IP llaing . (Ay 18 ORS 


Observed Means 


5. Test of Readina Comprehension and 
Language Achievement (English) 75.8 DES 


6. eFuture Plans Lee 2 0.08 
7. Wr ting Test Gaile Os 
8. Mathematics Achievement Test J Eigen Wes: 2s OU 
9. Age 188 0 O20 
PO ex Teeu’2 Onl Ss 
ll. Lanaquage Spoken in the Home enol Oey, 
l2.,Education-of Father/Male (Guardian 4.66 Se, 
13. Education of Mother/Female Guardian bbe fo!) lies a 
14. Occupation of Father/Male Guardian ae faa O257 
15. Occupation of Mother/Female Guardian 0,40 G07 


TABLE I.7 GCeontinuved) 


Variable 


on 


ay 


Lo. 


este 


CA. 


4 Oe 


ae 


2D. 


Total Credits 

hotal Credits in English 

Total Credits in Mathematics 

Number of SSHGD Courses in Languages 


Number of SSHGD Courses in History/ 
Geography/Social Science 


Number of SSHGD Courses in 
Mathematics/Science 


Number of Other SSHGD Courses 


Total Number of SSHGD Courses 


1.Q3 


8The number of schools far the Writing Test rwas 50. 
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